Methods of Automatic Morphological Analysis for Russian Texts

Student: Podledneva Elena

Faculty: School of Applied Mathematics and Information Science

Educational Programme: Bachelor

Year of Graduation: 2014

The paper is devoted to the problems of automatic morphological texts analysis written in Russian. It contains the different approaches to morphological analysis as well as their advantages and drawbacks. There were considered two morphological analyzers for texts written in Russian that are able to conduct full morphological analysis.In the study the methods of automatic morphological analysis and ways of resolving the disambiguation are explored, also two modules for morphological analysis for Russian language based on experimental texts are investigated. Finally the next goals were achieved: the method based on a set of context rules removing irrelevant results from the output was chosen; the context rules partially resolving morphological disambiguation were completed and implemented; moreover the experiments in order to assess the effectiveness of applying these rules to the tested analyzers were conducted.In order to solve all these problems it required to design a program allowing to compare the functioning of analyzers on a huge data sets as well as the program for partial morphological disambiguation avoiding. To test the analyzers’ functionalities the corpus of marked texts with resolved disambiguation from the project OpenCorpora was taken. Testing showed that the precision of such analyzers as Mystem and MorphanCrossLexica operation was around 20.1% and 26.3% respectively, while the completeness was 82,6% and 46,64% respectively. As a result it was obtained that the effectiveness of context rules application strongly depends on the completeness of morphological analyzer vocabulary coverage. The less the completeness of vocabulary coverage is the less effective the functioning of context rules is.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses