• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Generating Difficult to Pronounce Russian Language Texts Development using a Syntactically Marked Corpus

Student: Ponomareva Iuliia

Supervisor: Eduard Klyshinskiy

Faculty: HSE Tikhonov Moscow Institute of Electronics and Mathematics (MIEM HSE)

Educational Programme: Computer Systems and Networks (Master)

Final Grade: 9

Year of Graduation: 2019

In the final qualifying master's thesis on "Unpronounceable texts in Russian language using syntactically marked corpora generation system development" examines the generation problems analysis of unpronounceable texts in the existing mobile application for training person his diction and its improvement. The relevance of this topic is conditioned to the fact that, in Russian language there are lot of rules and exceptions, the order of words is not fixed. Therefore, until now, the methods of generating Russian-language texts do not give ideal results, which, in turn, gives the field for further experiments. On the other hand, considering the fact that the developed application will help in improving diction, this topic is relevant in that the trained speech contributes to a quick and easy understanding of the interlocutor. And this, in turn, is quite a useful skill in the modern world. You can, of course, do tongue twisters to train diction, but their texts are boring and it is likely that the language will be sharpened only on the words of tongue twisters. The object of the study is the stochastic method of generation of unpronounceable texts in natural language using spaced corpora. The subject of the research is texts in natural language. The main objective of the master thesis development of a system for generating unpronounceable Russian texts with the use of syntactically marked case. The final work consists of an introduction, four chapters, conclusion and bibliography. The work is presented on 31 pages. The first Chapter "Available data overview" identifies statistical data, algorithm and means of implementation used in last year's interdisciplinary course work, as well as a brief rationale for their choice. In the second Chapter "Research in previous data studies" analyzed the errors made earlier in the generation of texts and presented ways to address them. The third Chapter "Correction of the implementation of text generation" presents the process of correcting previous errors and refining the system of generation of difficult-to-pronounce texts in Russian using syntactically spaced corpora. The last fourth Chapter, "Evaluation of the results", presents an evaluation of the resulting Annex in three predefined categories. Each category had its own evaluation criteria defined in advance. In conclusion, the main conclusions on the final qualifying work.

Full text (added May 25, 2019)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses