• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Sentiment Frames Extraction

Student: Schennikov Nikita

Supervisor: Svetlana Toldova

Faculty: Faculty of Humanities

Educational Programme: Fundamental and Computational Linguistics (Bachelor)

Final Grade: 7

Year of Graduation: 2020

Sentiment analysis is the highly demanded and rapidly developing branch of the Natural Language Processing. It is widely used in different fields of the NLP. Unfortunately, there are considerable problems in some studies in this branch. Most of them are very narrowly specialized and produce results that are not applicable to a variety of tasks. Due to the fact that most of the previous works are based on thematic datasets, this paper should shed some light on this problem with a different to a standard approach. Despite increasing popularity and usability of such Transformers as BERT and Open AI GPT-2 and their outstanding text predictions, sentiment analysis of non-topical texts is still rather a hard task even for them. The main idea of this work is to broaden the exising sentiment frames lexicon using classification models. In this research, we will implement the method of sentiment framework in a sentiment analysis, which is based on the work of Karnaukhova, Loukachevitch. Sentiment framework is the verb-mediated model of specific connections of predicates that relies on the idea that the predicate could affect the polarity of the subject and object of the sentence (e.g. “X wins Y” make X positive and Y negative). According to Deng and Wiebe, the verb could assign certain sentiment on the words that are connected to it. So, the frame acts as a set of encyclopedic, linguistic and cognitive knowledge. Using Twitter-based corpus of short texts we want to solve the problem of thematic and objective datasets. Based on this data we want to extract information about verbs and convert them into features for classification models. The data from short texts will be extracted by different methods. First one is combined tokenizer from NLTK combined with pymorphy2. Second one is more accurate and highly developed model based on BERT - deeppavlov. Using this different type of methods could give us a first look on the problem and results and after that deep and contextualized research. We believe that this new approach could give us information that could be useful in solving the problem of sentiment analysis. In future usage of sentiment frames combined with modern solutions for classification and embeddings has the potential to be a fresh look at this problem.

Full text (added May 30, 2020)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses