• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Improving Word Classification with Graph Structures on Semi-Structured Documents Using Neural Networks

Student: Kobelev Maksim

Supervisor: Evgeny Sokolov

Faculty: Faculty of Computer Science

Educational Programme: Applied Mathematics and Information Science (Bachelor)

Year of Graduation: 2020

The idea of the word classification is an extremely important and popular problem which is likely to appear between people involved in automatic document processing. It occurs at the stage of extracting various information fields from the document, while processing it. In this study we are investigating the area which relates to the semi-structured documents – in particular, cash receipts – characterized by high information density. The described method is a way of improvement to the existing classifying words method, considered as a basic model, for such documents’ information extraction. Background of this study is supposed to develop competitive method for extracting a graph structure from documents and a apply modified techniques of graph convolutional networks to work with extracted data. This will give us additional features at the stage of classifiers’ final output formation. We provide a separate, auxiliary block to help the neural architectures be more effective in spatial networks’ perception of described complex document structure and increase their final quality of word classification. The work contains 17 pages, 8 chapters, 6 drawings, 1 table, 7 sources. Keywords – word classification, graph convolutional networks.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses