• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
  • HSE University
  • Student Theses
  • Automatic Detection of Gender Identity: the Phenomenon of Russian Women's Prose in Literature of the Late XX Century

Automatic Detection of Gender Identity: the Phenomenon of Russian Women's Prose in Literature of the Late XX Century

Student: Khazova Anastasiya

Supervisor: Boris Orekhov

Faculty: Faculty of Humanities

Educational Programme: Language Theory and Computational Linguistics (Master)

Year of Graduation: 2017

This research focuses on methods for automatically determining the gender identity of authors on the material of prose from 1960 to 2000. The purpose of this work is to identify optimal methods for automatically determining the gender identity of the authors. The objectives of this study include highlighting the grammatical and stylistic features of prose from 1960 to 2000 and, in particular, women's prose and texts of XVIII - XIX centuries; tracing the changes in the distribution of usage different parts of speech and punctuation for a specified period and conducting an experiment to identify the most effective algorithm for the classification of literary texts by using machine learning. The analysis revealed that women and men often use in their texts the following parts of speech: nouns, verbs, prepositions, pronominal nouns, conjunctions, and adjectives that reflects the specific artistic style. In addition, analysis was made of the use of the most commonly used punctuation marks from the given list: question mark, exclamation point, comma, colon, semicolon, period, comma. It has been observed that women are more actively using the means of punctuation as a means of expression in modern literature: the share of the use of exclamation, question marks and commas the writers is much higher than the value obtained through the analysis of men’s texts. The work also contains an analysis of the distribution of parts of speech and punctuation of literary texts of men and women of XVIII-XIX centuries. We performed experiment to identify the most effective algorithm for determining the gender identity of the author. It was found that the most effective classifiers of literature are the implementation of algorithms as BayesNet and SMO.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses