• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

A Russian-Language Text Summarization System

Student: BUKANOVA OL`GA

Supervisor: Alexey Malafeev

Faculty: Faculty of Humanities (Nizhny Novgorod)

Educational Programme: Fundamental and Applied Linguistics (Bachelor)

Year of Graduation: 2017

With the development of information technologies, time costs necessary to find information of interest increase, which reduces the personal effectiveness of the user. In such conditions, the field of automatic summarization becomes more and more relevant, since this sphere of natural language processing allows you to provide the necessary information contained in sources in a reduced form, which saves time resources. In this research, a program was prepared that allows you to generate a summary from the arrays of news articles. The result of automatic summarization is a document that contains the relevant information repeatedly appearing in the collection, without repeats, which at the same time includes additional information specific to any of the texts in the array. For the correct definition of the informative content of the text, a corpus of informative words for sports news was created. The collection was carried out with the help of crawlers, which save the text of the article and all the necessary metadata in a database. To analyze the work of the algorithm, articles from "RIA Novosti", "Interfax", "Korrespondent" and others were cited. Evaluation of each summary provided that it could be used as a tool for determining the quality of its work.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses