• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Student
Title
Supervisor
Faculty
Educational Programme
Final Grade
Year of Graduation
Danil Gizdatullin
Classification Algorithm Based on Sequential Emerging Patterns and Pattern Structures
Data Science
(Master’s programme)
2016
The analysis of demographic sequences is a very popular and promising direction of study in demography. The life courses of people consist of the chains of events in different spheres of life. Scientists are interested in the transition from the analysis of separate events and their interrelation to the analysis of the whole sequences of the events. However, this transition is slowing by the technical peculiarity of working with sequences. As of today, demographers and sociologists do not have an available and simple instrument of such analysis.

Human demographic behaviour can be very different varying over different generations, gender, education level, religious views etc., however, hidden similarities can be found and generalised by specially designed techniques. Even though there are many methods developed so far, the field is far from convergence with traditional sequence mining techniques that studied in Data Mining. Machine Learning (ML) and Data Mining (DM) are rather young and rapidly developing fields that require professional knowledge of computer science, which is usually missing in social sciences.

In this paper several approaches to sequence data analysis were studied. The main goal was to find interesting and interpretable data that can distinctly characterize different classes. The main object of this study was demographical data about people’s life events. Subject of the study is applying data mining methods for patterns finding.

To solve this problem algorithm for emerging patterns mining was created and developed by using Python language. Many experiments were performed to find optimal parameters for patterns mining. Also gender classification task were researched using sequences of life events.

As a result gender identifying patterns were revealed and interpreted. and also special classification algorithm with using emerging prefix patterns was developed.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses