‘Statistical Data Analysis Is One of the Basic Skills Required of Modern People’
September 15 is the deadline to submit application documents for the Master’s in Applied Statistics with Network Analysis, Russia’s first English-language online programme in applied statistics. The HSE University programme trains specialists capable of working with data for strategising business processes and analysing the system of interactions between companies and people.
Ivan Klimov, the academic supervisor of the online programme, believes that while it may be banal to talk about the data revolution, the fundamental nature of the process is indisputable.
— Do you agree that working with and analysing data play a paramount role in social and business processes today?
— I would like to begin with a banal statement: we live in an era of explosive data growth. We leave a lot of digital footprints about our behaviour, interests, intentions, and connections. At the same time, various research teams create measurement systems, collect and model data on a wide variety of processes in which a person may be only indirectly present or not involved at all: climate issues, geological data, astrophysics, the invention of materials, the development and testing of new drugs, epidemiological research, etc. People tend to make decisions based on data and data analysis. And paradoxically, data is constantly lacking. Think, for example, of the discussions about climate change and the factors that affect it. Therefore, my first thesis is that statistical data analysis is part of the set of basic skills for modern people. It’s like being able to go camping and play guitar in the sixties, or have a LiveJournal account in the 2000s. And today, it is embarrassing not to know Python or R.
— The programme is focused on network analysis. Why was this field chosen out of the whole arsenal of data analytics?
— Network analysis is a kind of fifth dimension when working with almost any data or data set. It focuses on interaction, on connections and relationships. Quite often, network analysis is confused with the analysis of social networks such as Twitter, VK, or YouTube. But to me, it is closer to biochemistry: antibodies or proteins in general are made up of amino acids and specific bonds between many different elements. Anything can be what is called a ‘node’ in network analysis: not only people, but also minerals, planets, enzymes. The main thing is that some kind of connection exists between them—that there are links. And then we can set any known characteristics on the nodes and watch how the nature and structure of connections change depending on the selected condition. By the way, we have a Telegram channel called Nodes and Links. It features many examples of where and how network analysis is used.
So, network analysis does not override other methods of statistical analysis. It can extend them in unexpected and sometimes very useful ways. And it just so happened that the nodes in many studies are people and organisations, simply because they cannot live without each other—they cannot want, act, achieve, or change.
— If network analysis is the fifth dimension, what are the other four?
— The first dimension is the focus of a study—its problem, goal, key research question. Without goals, there is no result. The second dimension is the object, the carrier of the problem under study. It is the streetlight I lost my keys under, not the one with light under it. The third dimension is time. What time period are we interested in? What happened to people in a certain period? When did the changes or aggravation of a problem begin? The fourth dimension is data and its structure. ‘Garbage in, garbage out’ is a favourite saying of Valentina Kuskova, the founder and first head of our laboratory. And the fifth dimension is connections. We have already talked about them.
— There is a sea of data out there. How severe is the problem of data quality? How accurate is the data that modern analysts work with?
— Our data about the world is no more accurate than the tools we use to obtain it. If this data is inaccurate, incomplete, or has some systematic bias, then the conclusions will be erroneous and biased. And it is terrible if we do not even know the type and amount of distortion. Therefore, in data science, it is very important not only to choose the right model or analysis strategy, but also to correctly plan data collection procedures, impartially assess the quality of the resulting data set, and have an end-to-end vision of the entire research process—searching for a problem, setting a problem, formulating hypotheses, planning an experiment or collecting data, selecting analysis tools, and correctly interpreting the results of statistical analysis. Accordingly, this understanding and accumulation of experience is the basis of the goal-setting of our Master’s programme.
— In your opinion, what is the most interesting part of the programme?
— I will formulate perhaps not the main point, but one that is important to me personally. It is the applied nature of our approach to learning, the focus on solving practical and often applied tasks. It is the participation in projects of both the laboratory and HSE University staff, as well as of our colleagues in the research industry.
We try to make sure that students can build up a good portfolio of projects for future employment.
For example, three students treated the sustainability of social entrepreneurship as a form of small and medium-sized business. They got so engrossed in it that they received support from the ZIRCON research group and HSE University’s Centre for Social Entrepreneurship and Social Innovation. And the Our Future Foundation and the Rybakov Foundation allocated grants for research—small ones, but still. Together with the students, we did a study for RIA Novosti on how people interact with interactive videos. One graduate student defended an excellent work based on eye-tracking, semantic differential, and a specially designed type of interview. The client—a development team—saved 14 million roubles, which they had intended to spend on what turned out to be a development with no chance of success.
Here's another case. A colleague involved in manager training and development received an anonymised data set with testing results for almost 2,000 people. These were top managers from large Russian companies. The task was interesting: to understand if there are any gender differences in the set of ‘destructors’—reactions that occur in a situation of stress, pressure, or uncertainty. Every person has them. And we have the opportunity to pull patterns from a rather hard-to-reach group—real leaders and top-level managers. Interesting, isn’t it?
— Who does the programme target?
—There are three types of students in our orbit. The first are already-established professionals in the field of data analysis and data mining. They want to add something to their professional expertise or enhance it. Financial analysts, employees of IT companies, HR specialists, healthcare workers, and university professors regularly come to us. There are also business owners, albeit rarely.
The second type of students are recent graduates of bachelor’s or even master’s programmes. They want to assemble their own unique configuration of basic vocational education—for example, to combine subject knowledge in political science, international relations, organisational management or state and municipal administration with skills in the field of data analysis and research work. And of course, there can be specialists from other subject areas, particularly sociologists. The Faculty of Social Sciences at HSE University recently launched a fantastic Bachelor’s programme in Computational Social Sciences. Our Master’s programme is the logical continuation of their professional development.
And the third type is international students. They are attracted by the programme’s structure, its accessibility, the possibility to build individual tracks. And of course, network analysis. There are not many centres in Russia—or indeed in the world—where in addition to mastering the methodology of network analysis, one can also gain experience in specialised programmes and see how real researchers solve such problems in real research.
That is why we greatly value our professional relations with our colleagues from the University of Ljubljana, with our supervisors Anushka Ferliga and Vladimir Batagelj. They are at the forefront of network analysis development and among the leading experts in this field.
— How sought-after are graduates of the programme? What kinds of careers can they build?
— The demand for professionals who can work with data, extract it, evaluate its completeness and quality, make calculations, and build models will only increase. The amount of data is not going to get smaller, and the tasks for analysis will only multiply and become more complicated. Analytical tasks, data modelling and statistical analysis have already become an integral feature of almost any business—from price forecasting to logistics analysis, from HR analytics to analysing the patterns of information dissemination in various social groups and environments. Graduates of the programme have two of the most clear-cut career tracks: further professional development in the corporate field in teams of analysts and marketers, and an academic track, which means that you will have your say in science and research methods.
But there are other opportunities as well. They involve the ability to make decisions, set tasks, and manage people and teams. It is no longer enough just to know data analysis methods. You need to be able to see the situation as a whole, for example, in a company, or see how some development programme is being implemented, how markets are developing. These are completely different skills and abilities. However, I am sure that the ability to work with data, to see the main focus and not be distracted by insignificant things, the ability to rely on analytical materials is an excellent basis for a successful career as a leader and manager.
Ekaterina Melianova and Artem Volgin, both graduates of the Master’s Programme Applied Statistics with Network Analysis, took second place in two CDP competitions: the Unlocking Climate Solutions Kaggle Competition and the COVID-19 Symptom Data Challenge. In the coronavirus-related data competition, the HSE graduates outperformed professors and PhD students of Virginia Polytechnic Institute and State University, the University of Washington, the Massachusetts Institute of Technology (MIT), and other foreign universities.
What do staff efficiency, power of the Medici family, and the Ebola epidemic have in common? It is that they can be studied with network analysis. In 2017, HSE launched a new English-taught master’s programme ‘Applied Statistics with Network Analysis’.Valentina Kuskova, head of theInternational laboratory for Applied Network Research, told the HSE news service how network research works in social studies.
Specialists from the HSE’s Nizhny Novgorod campus plan to create a new system of structuring data and accounting of webpages. The Laboratory of Algorithms and Technologies for Networks Analysis has won a grant from the Russian Science Foundation to study ‘Clustering and Search Techniques in Large Scale Networks.’
The HSE has set up a new International Laboratory for Applied Network Analysis. The lab’s Academic Supervisor, Professor at Indiana University, Stanley S. Wasserman and Deputy Dean for International Relations at the HSE Faculty of Management, Valentina Kuskova talked to the News Service about the aims of the new laboratory.