• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Named Entity Recognition Methods Research

Student: Machnev Alexey

Supervisor: Sergey Makarov

Faculty: Faculty of Computer Science

Educational Programme: Software Engineering (Bachelor)

Year of Graduation: 2019

Now, many companies have a problem of extracting characteristics values from stock items names. There is still no common solution for this problem. The problem is alike the problem of named entity recognition in text and connected with the problem of texts and parts of text embedding problem. There are some papers on these topics, however, in those papers they analyze natural language text, which could be simply splitted into words, most of which are used in texts more than once, and from which a dictionary could be built. This makes impossible to simply apply results of those papers in described problem. However, stock item names also consist of symbols and have a common structure. As well as in text, there are some hidden connections between symbols. Therefore, methods and algorithms described in those papers could be adapted to the problem of characteristics extraction. In this work, this approach is researched. The present report comprises 41 pages, 3 equations, 17 images, one table and 2 appendix. 41 links to sources are present. Keywords: characteristics extraction, machine learning methods, texts analysis, classification, information retrieval.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses