• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Analysis of Time Series of Household Spendings

Student: Danilin Rodion

Supervisor: Timofey Shevgunov

Faculty: Graduate School of Business

Educational Programme: Business Informatics (Bachelor)

Year of Graduation: 2021

The purpose of this paper is to analyze information devoted to household consumption of different countries and to develop a possible method to group and cluster countries based on their econometric metrics, such as: ● Gross domestic product per capita ● Actual individual consumption ● Household final consumption ● Human development index ● Genuine Progress Indicator ●Gross National Income This will allow the implementation of a decision-making approach built on the basis of various information systems calculations and data analysis methodology. To achieve this goal, time series models and statistical analysis are used, which were made with the help of the statistical package gretl and Python code written in Jupyter Notebook. The tasks set for this research were: ● Analyzing and selecting the correct metric that is best suited to represent the state of the country. ●Finding actual data and its preprocessing. ●Construction of the final models of time series taking into account the drawbacks of the primary models. ●Testing the models for their adequacy and accuracy. ●Analysis of the results obtained by clustering countries with the help of model building. As part of my pre-diploma internship, I selected a suitable metric to analyze the quality of life in the country, pre-processed the data and obtained a clustering of countries, strictly by constructing ARIMA models. In the process of considering the results of the grouping, clusters of countries were distinguished not by their current well-being, but by the dynamics of their development. This approach is applicable not only to the expenditures of the countries' households, but can also be used to analyze the dynamics of any metric presented in the form of a time series, within the boundaries of the analysis of the indicator for various categories (e.g. analysis of the average USE score by subjects, sales by various categories of goods, etc.). Further steps to refine the clustering method presented is the use of additional metrics related to time series for additional analysis, such as Permutation entropy, Skewness and many others. Most of these metrics can be taken from the tsfresh library. In addition, you can also add other economic data to broaden the scope of the study (e.g., oil prices, alcohol consumption, etc.). It is also possible to approach data preprocessing in a different way, namely using Wavelet or Fourier transform. This is a significant alternative to the logarithmic transformation used in this paper. As an improvement to the algorithm itself, you can create an optimized library for accelerating the calculations and model building, because the current rate of computing is slow enough and optimizing the existing code with the library numba or create its analogues in the programming language C or C++ significantly accelerate its work in connection with the need for more time to compile functions for each iteration in connection with the peculiarities of the language Python.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses