• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Cross-Functional Morpho-Syntactic Features in Stylometry

Student: Pimonova Elena

Supervisor: Oleg Durandin

Faculty: Faculty of Humanities (Nizhny Novgorod)

Educational Programme: Fundamental and Applied Linguistics (Bachelor)

Final Grade: 10

Year of Graduation: 2020

In the current research the task of authorship attribution is being solved. The material for the study is the texts of Russian and English classics of the XVII – XX centuries. The Russian corpus consists of 324 texts of 30 Russian classics. The English corpus includes 207 texts by 34 authors of the English classics. In order to classify texts by authors, we propose our own linguistic models of text representation based on morpho-syntactic features of English and Russian grammar. Simple morphological and syntactic models are built upon the frequencies of parts of speech and syntactic relations allocated in the UDPipe parser. A sophisticated morphological model proposes criteria for morphological and semantic analysis. In a sophisticated syntactic model, linguistic phenomena are categorized into phrases and sentences. In general, the presented morpho-syntactic models show quite a high result. In 20 out of 24 experiments, they outperform the baseline Doc2Vec model. The best accuracy obtained by a combination of morpho-syntactic features is 85% for Russian and 80% for English. The best result, namely 90% for Russian and 96% for English, is achieved when morpho-syntactic models are used together with Doc2Vec. This demonstrates that morpho-syntactic models of text representation can be successfully applied in the authorship attribution task. Especially considering the fact that they give a fully interpretable result. This is confirmed by an analysis of errors and important attributes. The error analysis revealed the patterns of the style similarity. The analysis of important attributes helped to determine universal and specific style-forming attributes for Russian and English.

Full text (added June 6, 2020)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses