• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
For visually-impairedUser profile (HSE staff only)SearchMenu

Method development and analysis of the socio-economic development of the federal subjects of Russian Federation based on statistical analysis and semantic analysis of big data

2020

Goal of research

Method development and analysis of the socio-economic development of the federal subjects of Russian Federation based on statistical analysis and semantic analysis of big data.

Methodology

The methodology of the project is based on ranking the subjects of the Russian Federation by selected indicators of official statistics, as well as by indicators formed on the basis of semantic analysis of big text data, and identifying, quantifying and analyzing discrepancies between them. As for semantic analysis of big text data, sentiment is chosen as the main indicator of the socio-economic situation in the subjects of the Russian Federation. This indicator shows the difference between the number of positive and negative publications on the subject.

Empirical base of research

  • Open sources of large arrays of news publications relevant to the socio-economic development of the subjects of the Russian Federation in the iFORA database (more than 10 million publications);

  • Official statistics on indicators from the Decree of the President of the Russian Federation (25.04.2019 N 193).

Results of research

1. The methodology of analysis of the socio-economic development of the federal subjects of Russian Federation based on statistical analysis and semantic analysis of big data has been developed;

2. Indicators of official statistics, formed in the regional context have been selected:

  • Number of high-performance jobs in the non-budgetary sector of the economy;

  • Number of employees in small and medium-sized businesses, including individual entrepreneurs;

  • Labor productivity in basic non-resource sectors of the economy;

  • Level of real average monthly salary;

  • Volume of investments in fixed assets, excluding investments of infrastructure monopolies (Federal projects) and budget allocations of the Federal budget;

  • Poverty level;

  • Life expectancy at birth;

  • Natural population growth;

  • Number of families who have improved their living conditions;

  • The level of housing affordability;

  • Percentage of cities with a favorable urban environment;

  • Environmental quality;

  • Education level;

  • The percentage of regional highways and highways in urban agglomerations that meet regulatory requirements, taking into account congestion.

3. The set and calculation methods of relevant indicators generated at the regional level on the basis of semantic analysis of big data have been designed:

  • As the main indicator of socio-economic situation in Russian federal subjects sentiment has been selected;

  • Sentiment is an indicator that shows the difference between the number of positive and negative publications about a topic, normalized by the total number of publications about the region for the period under review;

  • Sentiment is calculated based on automated analysis of more than 10 million news publications for the period under review based on machine learning.

4. The federal subjects of the Russian Federation has been ranked by selected indicators of official statistics, and by indicators formed on the basis of semantic analysis;

5. Discrepancies between official statistics and indicators generated on the basis of semantic analysis have been found and analyzed, including quantitative assessment (identified gaps in the ranks).

Level of implementation, recommendations on implementation or outcomes of the implementation of the results

The results of the study can be used for rapid assessment of the current situation in the regions, which can be based not only on quantitative data, but also on qualitative data on key topics of news publications. These topics suggest directions for in-depth analysis of socio-economic development in the regions.

The methodology and results of the research can also be used for monitoring the complex socio-economic development of the federal subjects of the Russian Federation using artificial intelligence technologies and big data processing methods. The monitoring may include the preparation of socio-economic profiles of the regions using both traditional statistical data and big data, as well as the semantic profiles of regions based on professional media, social networks, messengers and other sources of information. In addition, analysis and forecasting of the impact of the COVID-19 pandemic and the associated economic crisis on the socio-economic development of regions can be carried out using big data processing technologies, artificial intelligence and expert methods.