Year of Graduation
Methods for Recognizing Terminological Variants in the Scientific Text
Applied Mathematics and Information Science
A lot of tasks in the field of Natural Language Processing, such as creation of glossaries and subject indexes, are labour-intensive and time-consuming when done by a human. An automatic terminology extraction followed by grouping into meaningful clusters might provide a researcher with a structured representation of the subject area and thus facilitate his / her work. The goal of this work is to examine the applicability of the approach based on lexico-syntactic patterns in the task of terminology clustering and investigate how the choice of a similarity measure might influence the results. Terms are extracted from scientific texts with the help of LSPL language, then clustered with the K-Means algorithm using Levenstein, Monge-Elkan or Cosine similarity measures. Clusters obtained in the experiments are assessed in terms of their practical usefulness. On the basis of the experimental data a conclusion is made that the considered approach could find a use in applications, possible ways of its improvements are suggested.