• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Automatic Analysis of Literary Texts in Russian

Student: Averchenkova Anna

Supervisor: Dmitry Ilvovsky

Faculty: Faculty of Computer Science

Educational Programme: Financial Technology and Data Analysis (Master)

Year of Graduation: 2019

Nowadays, the volume of textual information that a person faces, starting from school years, is large. Texts that require reading, understanding the structure of what is written can be specialized textbooks or literary works at school, as well as documents, laws, letters, or instructions for workers in various professions. In order to understand the essence of these texts, without reading them completely, you can familiarize yourself with a shorter and more comprehensive presentation, which basically contains facts. In this paper, one of the ways to quickly understand the structure of interactions in a literary work is realized - the construction of a graph of relations between the main characters. To extract the interactions between the characters, two methods are proposed: the frequency method of their joint occurrence in different text units and the construction of word embeddings, using which the decision is made based on the cosine similarity. To implement this method, the novel by A. Dumas "The Three Musketeers" is chosen. The assessment of the quality of the implemented approaches is carried out in comparison with the answers of the respondents. The method of frequency co-occurrence in paragraphs for the main characters shows the highest intersection percentage of the top 3 agents with the answers of the respondents - 71%.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses