• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

A Recognition System for Cursive English Handwritten Text

Student: Saltykova Margarita

Supervisor: Maria Veretennikova

Faculty: Faculty of Economic Sciences

Educational Programme: Statistical Modelling and Actuarial Science (Master)

Year of Graduation: 2020

One of the main directions in the field of text recognition is the task of recognizing cursive handwritten texts. Writing of each person is unique and some letters may be completely different in style, have different sizes and shapes, skew of the letter. In addition, cursive words contain connections between letters. For this reason, the use of algorithms for recognizing individual characters after segmenting a word into letters leads to a lower level of recognition quality compared to handwritten typed text. In this paper, hidden Markov models and convolutional-recurrent neural networks were used to recognize cursive text. These algorithms make it possible to solve the problem of decoding texts without using line segmentation into words and letters. Implicit line segmentation is one of the results of these algorithms. Variational auto-encoder was considered as one of the methods for extracting features for the implementation of a hidden Markov model. This method allows to translate the input vector into a space of lower dimension. In this case, not the latent variables themselves are modeled, but their distribution. The presence of randomness in the extraction of attributes introduces regularization into the model. It was not possible to get good recognition results using hidden Markov models. This algorithm requires a lot of resources, so only one iteration of the training was carried out. However, it is shown that a variational auto-encoder can be used as a method for extracting features for hidden Markov models. The results of using convolutional-recurrent neural networks showed good recognition results. According to the Levenshtein distance metric, the correct transcription and predictions of the model are 95% similar. The resulting model completely recognizes 46% of the text lines.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses