• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Revealing Diversity and Specialization in Scientific Publications by Natural Language Processing Methods

Student: Sartakov Vasily

Supervisor: Yury Dranev

Faculty: Institute for Statistical Studies and Economics of Knowledge

Educational Programme: Science, Technology and Innovation Management and Policy (Master)

Year of Graduation: 2016

This work is devoted to a division of cognitive labour and important aspects of the phenomenon of science, such as diversity and specialization. The work demonstrates that the modeling of a division cognitive of labour can be used for modeling of scientific activities, and this, in turn, can be the basis for the development of new methods of governance of science, technologies, and innovations. For this, new tools for recovering of science configuration, extraction of research strategies patterns of the participants of the scientific process should be developed, because existing bibliometrics methods such as the analysis of citations and co-citations cannot be applied in real time. In this work I propose to use natural language processing techniques as a tool for recovering the structure of scientific disciplines. I demonstrate and prove by qualitative assessments, that by use of TF-IDF algorithm and automatic classification k-means++, one can construct quantitative methods for recovering of "specialization" and "diversity" of a scientific field from the raw texts of publications from this scientific field.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses