• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Detecting Propaganda Techniques in News Articles

Student: Chikina Anna

Supervisor: Ekaterina Artemova

Faculty: Faculty of Computer Science

Educational Programme: Data Science (Master)

Year of Graduation: 2020

The binary propaganda classification task in the news articles is described in the paper. The classification experiment was conducted for word and text fragments. The classification was done using the data of the SemEval 2020 competition that is in open access. The dataset consists of the manually labeled news articles. The classes in the presented data are strongly unbalanced. The problem was solved with classical methods of machine learning and more complex methods of deep learning. Logistic Regression and the Support Vector Classification models were trained with linguistic and syntactic features were constructed in the work. The neural network was also developed and trained. The architecture of the neural network includes the BERT tokenizer, the natural language sequence model BERT, and a layer of bidirectional long short-term memory (BiLSTM). The paper discusses methods of preprocessing and augmentation of the text data for balancing the presented classes, such as replacing some words with their synonyms, deleting and inserting tokens in a sentence, switching words. The work also describes the loss function used while training neural networks with unbalanced data. The paper presents the results of binary classification for models of different types. The results show that a more complex model architecture improves metrics in the token classification problem. The metric values for the fragment classification problem are comparable for simple machine learning models and a neural network based on BERT and BiLSTM. In the future, the work done can be the basis for a deeper study of the methods for classifying the propaganda. The propaganda detection problem is relevant and in demand in the present time. Algorithms of propaganda detection, existing at the moment, have not yet shown excellent results for the classification of words and phrases in a text.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses