• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Analyzing Linguistic Representativeness of Distributional Semantic Models on the Lexical Level

Student: Bakarov Amir

Supervisor: Andrey Borisovich Kutuzov

Faculty: Faculty of Humanities

Educational Programme: Computational Linguistics (Master)

Final Grade: 9

Year of Graduation: 2019

Distributional Semantics Models like Word2Vec are one of the most ubiquitous tools in Natural Language Processing. However, it is still unclear how to optimise such models for specific tasks and how to evaluate them in a general setting (having ability to be successfully applied to any language task in mind). In my work I argue that benefits of intrinsic distributional semantic models evaluation could be questioned since the notion of their "general quality'' possibly does not exist. My hypothesis is that distributional semantic models are just being trained to resolve certain tasks (language modelling, sequence labelling, etc) and do not therefore reflect the structure of the lexical level of a human language; therefore, distributional semantics are not representative to language in terms of ability to extrapolate the model to language patterns not observed in the training set. In this thesis I try to empirically formalise the notion of linguistic representativeness on the lexical level and find out how inner mechanisms of the distributional hypothesis argue with this notion. I propose experiments with an extension of distributional semantic models generalised to a set of multiple languages and test the hypothesis. Results of the experiments support the hypothesis on a sample of several distributional semantic models, languages and benchmarks, hence questioning representativeness of distributional semantics, at least at their current state.

Full text (added May 29, 2019)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses