2. Research objectives. The strategic goal is to set up and develop linguistic corpora, namely, the set of corpora of academic writing (Russian Academic Writing Corpus, English Academic Writing Corpus and Corpus of Translations from English), as well as Russian Heritage Corpus, Blog Writing Corpus and Regional Russian Corpus. These corpora provide data for carrying out multiple-stage and multiple-factor typological research into tendencies in speech errors and systemic deviations form the standards as these tendencies are being formed in non-standard contemporary Russian. The results of the research are then compared with the data from Russian National Corpus, which in turn sheds light on Russian language lexical and grammar development.
3. Research Empirical Basis. Corpora mentioned above are based, correspondingly, on written assignments submitted by students of the Higher School of Economics in their academic courses, on oral and written speech samples of student learners of Russian in their course of Russian as a foreign language (submitted with students’ consent), on Internet blogs, and, finally, on field trip recordings to the area of a certain regional variation of Russian.
4. Research Results. Strategies and ways of collecting materials in each of the corpora have been worked out; models of data metatagging have been designed; search tools for finding the necessary articles in the corpora have been constructed; categorization of errors have been introduced. New registers in Russian speech have been proposed, and it may pave the way to restructuring lexical and grammar standards of modern Russian.
5. Stages in results application
5.1. The following corpora of “non-standard speech” have been set up:
5.1.1. Russian Heritage Corpus is the collection of texts in Russian produced by the children of emigrants from Russia (speakers of “heritage” Russian). This variety of Russian is different in its lexicon and grammar from both the mainland Russian and Russian acquired by speakers of other language when they study Russian as a foreign language. The character of deviations can be accounted for not only and not largely by the interference of the dominant language and heritage language, but rather by applying specific regulations inherent in the system of Russian language but not developed – or only partially developed – in the mainland language. There are essays, fixed-time replies and free replies to questions in the tasks in this corpus. Tagging includes identification of the genre and the author and allows carrying out the correspondence between language evidence and the level of mastering the language stated.
5.1.2. Regional Russian Corpus (Dagestan) is based on deciphered recordings of interviews given by the inhabitants of a few villages in Dagestan (9 recordings of speakers of different languages as L1 from 4 different villages). The regional variety reflects lexical and grammatical features typologically related to other languages in the environment but not restricted to calques.
5.1.3. In Blog Writing Corpus, methods of data collection from the Russian area of the Internet have been developed for different formats (message or comment) in their coexistence with other electronic means and with strong emphasis on information visual components. The main blogs (3422 total) with users from different social, age and linguistic potential groups have been outlined, and the total of 38.5 million words have been collected.
5.1.4. Russian Academic Writing Corpus is a collection of texts produced by Russian students of the Higher School of Economics (Bachelor’s and Master’s programmes in different departments) in their course of Academic Writing in Russian. The main types of texts are theses, essays, annotations, autobiographies and replies to questions within the course. The texts were collected in 2012-2013 academic year and total about 1.3 million words.
5.1.5. English Academic Writing Corpus (100 essays of about 50,000 total, 1,346 mistakes) and Corpus of Translations from English (500 texts of about 400,000 words) both include Russian students’ texts in English (essays, reviews and abstracts, on the one hand, and translations, on the other, correspondingly) with mistakes tagged, corrected and provided with comments. Argumentative and descriptive essays in English were written assignments administered to students at Upper Intermediate level in General English course, while reviews and abstracts were assignments given in the course of Academic Writing in English.