Year of Graduation
Methods of Automatic Morphological Analysis for Russian Texts
School of Applied Mathematics and Information Science
The paper is devoted to the problems of automatic morphological texts analysis written in Russian. It contains the different approaches to morphological analysis as well as their advantages and drawbacks. There were considered two morphological analyzers for texts written in Russian that are able to conduct full morphological analysis.In the study the methods of automatic morphological analysis and ways of resolving the disambiguation are explored, also two modules for morphological analysis for Russian language based on experimental texts are investigated. Finally the next goals were achieved: the method based on a set of context rules removing irrelevant results from the output was chosen; the context rules partially resolving morphological disambiguation were completed and implemented; moreover the experiments in order to assess the effectiveness of applying these rules to the tested analyzers were conducted.In order to solve all these problems it required to design a program allowing to compare the functioning of analyzers on a huge data sets as well as the program for partial morphological disambiguation avoiding. To test the analyzers’ functionalities the corpus of marked texts with resolved disambiguation from the project OpenCorpora was taken. Testing showed that the precision of such analyzers as Mystem and MorphanCrossLexica operation was around 20.1% and 26.3% respectively, while the completeness was 82,6% and 46,64% respectively. As a result it was obtained that the effectiveness of context rules application strongly depends on the completeness of morphological analyzer vocabulary coverage. The less the completeness of vocabulary coverage is the less effective the functioning of context rules is.