• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
For visually-impairedUser profile (HSE staff only)SearchMenu

Social and Political Processes Online: The Structure and Content of Social Interactions

Priority areas of development: sociology, engineering science
2014

This research consists of three sub-projects, each having its object and goals. 

The first sub-project “Mapping ethnic attitudes in the Russian language LiveJournal with advanced topic modeling” investigates  representations of ethnic groups in Russian blogs while working on the problem of stability of the topic modeling algorithm being used for mining ethnic discourse. The second sub-project “Structure of communities in social networking websites” studies a range of network, structural and content features of certain types of groups in Vkontakte social network, including professional groups of software developers, groups of St.Petersburg observers social movement and anti-medical groups. The third project “Online recommender systems: analysis of publications and new developments” focuses of recommender system development for sparse data and publications in this sphere.

Research goals. The first sub-project maps attitudes of Russian-language blogger towards various ethnic groups; it also optimizes stability of topic modeling algorithm aimed at finding ethnic-related topics. The second sub-project aims at finding relationship of network structure of communities in VKontakte SNS with socio-demograhpic and other properties of these communities. The third project develops new algorithms for recommender systems and analyses latest publications in this sphere.

Empirical base of the research. Empirical base of the first sub-project includes: (a) 363579 posts  of top 2000 users of LiveJournal blogging platform according to the social capital rating; time period: 11 weeks from February 4 to May 19, 2013; 990 posts selected for manual analysis; (b) dataset with 101481 posts for testing regularizers for Latent Dirichlet Allocation topic modeling algorithm. Empirical base of the second sub-project consists of: (a) 11 groups of software developers in VKontakte SNS with over 10,000 users, including one group selected for in-depth analysis with 15,451 users; (b) 17 district groups of St.Petersburg observers  and one all-city group whose data were collected in 16 time points,  totally over 13 thousand users; (c)  a 2.0 level hyperlink ego-network of an AIDS-denialist group in VKontakte SNS consisting of 11 groups. Empirical base of the third sub-project contains: (a) a dataset from FMhost online radio broadcasting consisting of 4266 users, 3618 tags, 2209 radio stations and 4165 tracks; (b) automatically generated collection of research papers about recommender systems created from the top 18  relevant conferences.

Research results

Sub-project 1. It has been found out that most intensively two types of ethnic groups are discussed in blogs: most often, distant “geopolitical foes” (e.g. Americans) and a less often proximate, but socially problematic groups (e.g. Tajiks). Three quarters of texts discuss  ethnic groups either in political of cultural / ritual contexts, and the former prevails. With high probability some nations are discussed in one particular context, while others are associated with another one. The five most negatively described nations are “Caucasian”, Tajik, Dagestani, American and African/Negro. Tajik and Chechen are also among the most inferior ethnicities. Dagestani, American, British and Caucasian are among six most dangerous; Dagestani, American, British, German and Chechen are among six most alien. It has been also found out that already in winter and early spring 2013, quite long before the Ukrainian crisis, two Ukrainian topics were present in the blogosphere that included all main parties, characters and problematic points of the future conflict.

The research of stability of three topic modeling algorithms has shown that the proposed method of granulated sampling leads to the highest increase of the number of stable topics, as compared to the  other two algorithms: to 135 of 200 against 84 and 135, when measured with normalized Kullback-Leibler metric. It also gives a much higher value of Jaccard index (0.6 against 0.3).

Sub-project 2. The research has revealed that the district groups of St.Petersburg observers are not independently emerged movements, but branches of the all-city movement, albeit affiliated with it to a varying degree. Their activity peaks at the very start during 2011-2012 national electoral cycle, however, those groups that were “alive” from the beginning to not die, but stabilize. Group moderators, that it movement leaders, set the agenda, while the community expresses opinions on it (comments) and approval / solidarity (likes). Group size is dependent on the number of moderators’ posts, but not individual posts. This indicates the central role of leadership for a group’s success. Offline leaders of the movement are well predicted with their online properties, in particular with their centrality in the overall network of friendship, the number of district groups they belong to, and the volume of feedback they receive. 

It has been also found out that no ties established in the studied professional community of software developers emerge based on users’ geolocation; this confirms a hypothesis about existence of geo-independent communities online. The study of egonetwork of the AIDS-denilaist movement has not found a sufficient proof of it being a part of a broader anti-medical movement.

Sub-project 3. Three new algorithms for recommender systems with tags have been developed, including TagLDA. Experiments have shown that they perform better for relatively small sparse datasets compared to traditional algorithms. An overview of results and trends in the sphere of recommender systems has been made based on the latest relevant publications. A software has been developed that trains TagLDA algorithm.

Implementation of research results. Algorithms tested in the first sub-project are implemented into a software that is used in other projects of the Laboratory for Internet Studies. Recommender algorithms may be used in any small-size commercial recommender system. Methods of analysis of ethnic discourse may be used for mapping other types of user attitudes and thus serve an analytical base for policies in the relevant areas.

Publications:


Bodrunova S., Nikolenko S. I., Koltsova O., Koltsov S., Шиморина А. Interval Semi-Supervised LDA: Classifying Needles in a Haystack, in: Proceedings of the 12th Mexican International Conference on Artificial Intelligence (MICAI 2013). Berlin : Springer Verlag, 2013. С. 265-274. 
Ignatov D. I., Nikolenko S. I., Abaev T., Konstantinova N. Online Recommender System for Radio Station Hosting: Experimental Results Revisited, in: Proceedings of The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2014, 11-14 August 2014 Warsaw, Poland. Los Alamitos : IEEE Computer Society Conference Publishing Services (CPS), 2014. С. 229-236. 
Кольцов С. Н., Кольцова Е. Ю., Митрофанова О. А., Шиморина А. Интерпретация семантических связей в текстах русскоязычного сегмента Живого Журнала на основе тематической модели LDA, in: Технологии информационного общества в науке, образовании и культуре. Сборник научных статей. Труды XVII Всероссийской объединенной конференции «Интернет и современное общество» Санкт-Петербург, 19 – 20 ноября 2014 г.. Санкт-Петербург : Университет ИТМО, 2014. С. 135-142. 
Видясова Л. А., Кольцов С. Н., Чугунов А. В. Формирование «повестки дня» в сфере электронного правительства: результаты контент-анализа новостных сообщений, in: Технологии информационного общества в науке, образовании и культуре. Сборник научных статей. Труды XVII Всероссийской объединенной конференции «Интернет и современное общество» Санкт-Петербург, 19 – 20 ноября 2014 г.. Санкт-Петербург : Университет ИТМО, 2014. С. 124-128. 
Алексеева С. В., Кольцова Е. Ю., Кольцов С. Н. Общественное мнение онлайн: сравнение структуры и тематики постов «обычных» и «популярных» блогеров Живого Журнала, in: Supplementary Proceedings of the 3rd International Conference on Analysis of Images, Social Networks and Texts (AIST 2014). Екатеринбург : CEUR Workshop Proceedings, 2014. С. 177-181. 
Koltsov S., Koltsova O., Nikolenko S. I. Latent Dirichlet Allocation: Stability and Applications to Studies of User-Generated content, in: Proceedings of WebSci '14 ACM Web Science Conference, Bloomington, IN, USA — June 23 - 26, 2014. New York : ACM, 2014. С. 161-165. 
Koltsova, O., Koltcov, S., Alexeeva, S. Do ordinary bloggers really differ from blog celebrities?, in: Proceedings of WebSci '14 ACM Web Science Conference, Bloomington, IN, USA — June 23 - 26, 2014. New York : ACM, 2014. С. 166-170. 
Митрофанова О. А., Шиморина А. С. Моделирование семантических связей в текстах социальных сетей с помощью алгоритма LDA (на материале русскоязычного сегмента Живого Журнала), in: Структурная и прикладная лингвистика. Санкт-Петербург : Издательство Санкт-Петербургского государственного университета, 2014. 
Nikolenko S. I., Koltsov S., Koltsova O. Measuring Topic Quality in Latent Dirichlet Allocation, in: Proceedings of the Philosophy, Mathematics, Linguistics: Aspects of Interaction 2014 Conference. St. Petersburg : Международный Математический Институт им. Эйлера, 2014. С. 149-157. 
Структурная и прикладная лингвистика. Санкт-Петербург : Издательство Санкт-Петербургского государственного университета, 2014. 
Ignatov D. I., Nikolenko S. I., Abaev T., Poelmans J. Improving Quality Of Service For Radio Station Hosting: An Online Recommender System Based On Information Fusion / Высшая школа экономики. Series MAN "Management". 2014. No. 31. 
Koltsov S., Koltsova O., Mitrofanova O. ..