• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Towards Mining Dialogue Scenarios from Plays: Clustering, Classification and Semantic Labeling of Play Graphs

Student: Vlasov Vladimir

Supervisor: Eduard Klyshinskiy

Faculty: Faculty of Humanities

Educational Programme: Computational Linguistics (Master)

Final Grade: 7

Year of Graduation: 2020

This paper describes a method for creating a clustering model for phrases of dialogues and short dialogues consisting of three phrases. As a part of this work, was created an approach to clustering dialog phrases based on vectors from the Cross-language Transformer/CNN encoder for sentences and the KMeans clustering method. Then, using a language model based on a recurrent neural network, a vector presentation of short dialogues was obtained, which were then also clustered. The research was conducted at the Russian drama corpus (RusDraCor), and its results can be used both in the field of digital humanities and in the commercial field for semi-automatic marking of dialogues.

Full text (added June 4, 2020)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses