• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Student
Title
Supervisor
Faculty
Educational Programme
Final Grade
Year of Graduation
Nikolaj Popov
Data Mining and Machine Learning for social networking websites
School of Applied Mathematics and Information Science
Bachelor’s programme
2014
The aim of this work was to solve the problem of predicting the number of likes to a post, analyzing not only its content, but also a potential audience using Data Mining and Machine Learning.The data of social network "Odnoklassniki", provided by SNA Hackathon competition (St. Petersburg, Russia, April 2014. http://sh2014.org/), was used. Initially, the problem was solved by linear regression. Various factors of regression were considered such as the presence of images, links, the ratio of letters (caps), belonging to a frequency dictionary specially composed for this purpose, days of week, the average number of likes in the group etc., from which the ones, which give the most accurate result, were chosen. After that another predictor - clusters - was added. All experiments were taken in «Ipython Notebook».Splitting into two clusters was produced by methods based on modularity: "Fast greedy community" and "Edge between ness community". Both of these methods are presented in «Pajek» which was used for the clustering.The solution of the problem of SNA Hackaton contest (0.231) is not much inferior to the leader of the competition, which scored 0.303, can be considered as a result. Besides, there is a constructed model of linear regression and the selected predictors for it. The most notable result is the successful usage of clustering methods in a predictive problem of this type. There have also been proposal hypotheses of how to improve the results and the direction of future research indicated.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses