• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Detecting Propaganda in Russian News Articles with Neural Networks

Student: Grigorev Eduard

Supervisor: Boris Orekhov

Faculty: Faculty of Humanities

Educational Programme: Fundamental and Computational Linguistics (Bachelor)

Year of Graduation: 2020

In this paper, the problem of detecting propaganda in Russian texts was considered. This problem is logically subdivided into 2 subtasks: the binary classification of the presence of propaganda in a sentence (SLC) and the assignment of one of the N classes of propaganda techniques to a propaganda sequence (FLC). To solve these problems, we examined 68 propaganda techniques and strategies and selected 18 suitable for automatic detection. Then we formed a corpus of Russian texts with a size of 42.5 thousand tokens (54 documents) and marked it up with 18 selected propaganda techniques. To automatically detect propaganda techniques, we applied neural network architectures, tested on propaganda texts in English, to the data in Russian to determine SOTA quality using the Python machine learning library PyTorch. In particular, we examined the standard BERT model, the integrated BERT model for solving 2 (SLC and FLC) problems simultaneously, the “granular” and “multi-granular” BERT models. In solving SLC problem we were able to achieve the quality of the F1 measure equal to 0.6382, and in solving the problem of FLC we achieved a quality of 0.1775 (Micro-Average F1-measure for multiclass (N = 19) classification). From the metric values ​​obtained during the experiments, it can be concluded that the transition to RuBERT from the standard multilingual BERT model improves the quality of models, and the transition between the above neural network architectures does not guarantee an increase in the quality of tasks, therefore, these architectures require more detailed research in the future. To demonstrate the quality of models, we programmed a platform using the Python framework Flask where you can conveniently mark up text for further training of models on it, as well as automatically mark up arbitrary text in Russian using pre-trained models. Such a system makes explicit the use of propaganda techniques in texts and can help increase awareness among people.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses