• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

The Problem of Spelling Mistakes and Misprints Automatic Correction

Student: Ezerskaya Anastasiya

Supervisor: Anna Khomenko

Faculty: Faculty of Humanities (Nizhny Novgorod)

Educational Programme: Fundamental and Applied Linguistics (Bachelor)

Final Grade: 8

Year of Graduation: 2021

Final qualifying work is dedicated to solving the problem of automatic spelling correction of words in IT-discourse texts. Texts, written by the computer specialists, include many neologisms, technicalisms and a lot of specific vocabulary. Standard spellcheckers represent low efficiency for the texts of this type. The goal of this research is to create a specialized spellchecker that copes with correcting miswritten words in IT-discourse text more efficiently than a standard spellchecker intended for texts of everyday discourse. In this research the improvements to the basic algorithm of spellcheckers have been implemented. First of all, a list of slangisms and their word forms to supplement the spellchecker dictionary has been formed. Second of all, the model that reflects the semantic connections between the words of IT-discourse and the words of everyday discourse has been trained. It was used in the spellchecker algorithm when choosing the final word to replace from the list of candidate words. Moreover, some tasks like identifying the main features of IT-discourse and determining the most productive ways of forming neologisms of IT-discourse. As a result of testing the specialized spellchecker, developed as a part of this research, the efficiency of correcting IT-discourse texts appeared to be higher than the one that «Yandex.Speller» shows by 13,3% by the F-mesure metric. This difference is explained by the ability of a specialized spellchecker to correct the mistakes in slang words and also by its ability to make choice of a final word of the candidate list more efficiently than any other spellchecker does. Improving the quality of the corrected text proved that such a way of solving the problem as creating a specialized spellchecker for a specific area of application is actual and can lead to a significant improvement in the work of automatic correction of misspelling. The main perspective of the created specialized spellchecker is its application on the site, which programmers use to discuss work issues. The ability to solve real tasks will show its real effectiveness.

Full text (added June 7, 2021)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses