Experimental Investigation of Methods for Predicting the Popularity on News Reports

Student: Lebedev Innokentii

Educational Programme: Software Engineering (Bachelor)

Year of Graduation: 2017

This study describes the approach to forecast message popularity in social networks based on the model of user interests. The paper studies traditional approaches to forecasting popularity. Particularly, the paper provides a review of several theoretical algorithms that were used for solving the research problem, as well as open and commercially accessible tools of monitoring and analyzing popularity trends. Furthermore, a previously unknown model for describing the interests of the user has been presented. The model is based on the sentiment analysis of texts, written by the user in the social network, and the model has been applied to the research problem of forecasting the popularity of text messages. The forecasting process is executed on the basis of text message analysis, the author's profile of the message and the analyzed community. The language of the text is Russian, the objects of the study are communities of VKontakte social network. The research resulted in creation of an auxiliary system for collecting network and textual information from online social networks. The training and assessment of the quality of the method performance was executed for random messages of 6000 communities in VK social network. During the research process, algorithms for classification of Random Forest, Support Vector Machine and k-Nearest Neighbors were used. The resulting classificator attributed the message to one of the four intervals of popularity, obtained on the basis of splitting the number distribution of the likes of messages by quantiles. The resulting F1-measure that is complemented by the usage of the proposed method was 0.77, which is 12% higher than the basic algorithm without using the thematic properties of the community. The paper contains 56 pages, 3 chapters, 29 figures, 16 table, 44 references and 2 applications. Key words: sentiment analysis, classification, social networks, machine learning, feature selection, model training, popularity forecasting.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses