Year of Graduation
The Analysis of Methods of Authorship Verification Based on Short Russian Texts
Fundamental and Computational Linguistics
Automatic attribution of authorship is an analysis of a text by applying statistical, mathematical and computer approaches to determine the author of this text. Over the years, a huge variety of features and methods have been proposed, but the vast majority of work has been devoted to the study of fiction. With the emergence of the Internet and the spread of all kinds of electronic texts (for instance, messages in social networks) an interest to its attribution aroused. Therefore, many recent studies are evaluating the effectiveness of existing methods and features for a new type of texts. This paper is dedicated to testing two methods - support vector machine (SVM) and dissimilarity method - on a material of news articles from the popular Internet journal and tweets in Russian. As a main feature, we use the n-gram (bigrams and trigrams) frequencies, as well as some additional features – conjunctions and parenthesis frequencies and elementary syntactic features.