• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Student
Title
Supervisor
Faculty
Educational Programme
Final Grade
Year of Graduation
Irina Nikishina
Automatic Information Extraction about Persons from News Articles
Fundamental and Applied Linguistics
(Bachelor’s programme)
10
2017
The current paper is devoted to the autiomatic extraction of information about persons from news articles

Automatic analysis of texts and writing genres has been of particular interest ever since the appearance of the computers. Nowadays Natural Language Processing (NLP) is widely used in a number of domains with the aim to computerize human language studies. For instance, one of the most important applications of NLP is the NER (Named-entity Recognition) which tools seek to locate and to classify named entities in text into pre-defined categories.

The aim of the current study is to develop a general-purpose algorithm for the automatic analysis of newspaper articles based on the social graphs theory. The study solves several tasks, such as collecting newspaper articles from a website, identifying characters in a newspaper article, representing the relationships between characters, creating the desktop user interface.

The result of the present study is a desktop application, permitting to perform the analysis of news articles (texts) focused on the relations between characters. The program automatically extracts characters’ names from the text, associating them with their clusters. Basing on the data, it is possible to build a graph representing the relationships between the characters and determine its main characteristics. It is important to note that the current product might be used in any domains related to the analysis of the text and the definition of personal names.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses