• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Rhetorical Structure Theory Application in the Detection of Deceptive News Stories

Student: Pisarevskaya Dina

Supervisor: Svetlana Toldova

Faculty: Faculty of Humanities

Educational Programme: Computational Linguistics (Master)

Year of Graduation: 2016

Deception detection in news stories is an urgent problem, especially in our contemporary world where we deal with the large amount of information got from diverse sources. New tools for automated deception detection and information verification in online media, based on linguistics methods and models, are required in our society. At the discourse level in the field of natural text processing, the framework of the rhetorical structure theory (RST) can be used. Our corpus consists of 134 truthful and deceptive news stories. Our hypothesis is that there are significant differences between structures of truthful news stories and structures of deceptive (fake) news stories. They are based on some peculiarities of RST relations among discourse parts in these texts. Our aim is to reveal the differences using RST relations as deception detection markers. Our texts annotations contain 33 relation categories from the expanded list (Mann, Thompson). To solve the problem of text classification and machine learning, we used Support Vector Machines (for the linear kernel and rbf-kernel) and Random Forest Classifier, with cross-validation 10-fold for both classifiers. We used rhetorical relation categories (frequencies), and the combinations of categories (relation categories + bigrams of categories and relation categories + trigrams of categories) as attributes – so three data sets of experimental data were created. The best results we got by using Support Vector Machines with linear kernel for the first data set, it could be better linearly divided into two classes. The model has predictive power (0.65), exceeding the figures for a similar study for the English language (0.56), as well as human ability to recognize deceptive news. The existing model could be used as a preliminary filter for deceptive (fake) news detection. It would leave truthful news in the set.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses