• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Student
Title
Supervisor
Faculty
Educational Programme
Final Grade
Year of Graduation
Elena Chudakova
Active Learning for Sentiment Analysis
Data Science
(Master’s programme)
2018
Sentiment analysis is a common task in the industry. A special case is the classification of documents of the same sentiment. Such tasks are constantly encountered in advisory systems, in marketing departments and in product quality tracking departments. With the help of deep learning methods one can get a good classification quality, but the main problem of this approach is the requirement for a large number of training objects to obtain a good quality model. In connection with the popularity of neural networks, an actual problem is the search for an algorithm that allows one to select objects not in a random way, but using certain criteria that can prompt which object will be more or less informative for constructing the model.

This task is solved within the framework of active learning, in which a small number of objects are first used for training, then from an unlabeled data pool we choose not all randomly, but those that maximize some information function. Next, we provide them to a conditional markup for putting a mark and add it to the training sample. The process is repeated several times. This allows you to significantly reduce the resources that are spent on manual markup.

In this paper, several approaches are used to determine the minimum number of objects, training on which will yield a quality model comparable to baseline. The methods are applied to the problem of classification of reviews of bank users. Convolutional and recurrent neural networks are considered as the initial classifier. The use of methods of active learning can significantly reduce the amount of training sample.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses