• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Development of a Multilingual Toxic Comment Classification Model

Student: Iakushechkin Dmitrii

Supervisor: Vyacheslav Zhukov

Faculty: Graduate School of Business

Educational Programme: Electronic Business (Master)

Year of Graduation: 2020

Master thesis examines theoretical and practical aspects of the process of toxic comments classification. The first chapter contains an analysis of the subject field: the concept of toxic comments, natural language processing tasks and problems, as well as a comparative analysis of existing approaches to solving of these problems. The second chapter includes forming of the approach to current research, in particular, the rationale for choosing models for natural language processing, the process of developing a deep learning model, and algorithms of two models used to solve the problem of this research - mBERT and XLM-RoBERTa. The third chapter contains the results of the work done on developing a model for comments classification and results analysis.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses