Northern Caucasus: aspects of linguistic and ethnic diversity

Priority areas of development: humanitarian
Department: Laboratory of the Caucasian Languages
The project has been carried out as part of the HSE Program of Fundamental Studies.

Goals of the project: Within the project, we study language structures and genetic diversity of various ethnic groups in the Caucasus, including: genetics studies of Tabasarans within their historical settlement area; online phonetic dictionaries of the minority languages of the Northern Caucasus; synchronic studies in various domains of morphology and syntax of the languages of the Caucasus as well as their diachronic analysis; studying of various sociolinguistic aspects of the language situation – collecting data on multilingualism and language convergence.

Collection of empirical data included:

  • collection and analysis of genetic samples from various locations in Daghestan, controlled for kin relations and affiliation of the donor with one of the patrilineal clans;
  • collection of sociolinguistic data (multilingualism questionnaires, lists of lexical and structural borrowings and other);
  • documentation of minority languages;
  • fieldwork and corpus-based collection of grammatical data.

Methods used

The project uses linguistic proper, sociolinguistic and genetic methods of data analysis. Linguistic methods stricto sensu include corpus-based studies and comparative diachronic and synchronic analysis of the grammar. The study of the semantics of localization markers is based on the classification proposed by Dmitri Ganenkov (Ganenkov 2008, 2010). Studying language situation in Daghestan is based on collecting sociolinguistic questionnaires focused on multilingualism (Dobrushina 2013). Sociolinguistic methods thus include interviewing our respondents about  historical patterns of multilingualism in Daghestan through eliciting family histories of bilingualism. Genetic methods are focused on patrilineal kinship and thus only include genetic material from Y-DNA; we use the same panel of 20 microsatellite sequences to determine patrilineal connections between our donors.


As a multidisciplinary project, we use data of diverse nature. Genetic research is based on genetic samples obtained from male donors (the analysis being limited to Y-DNA, as explained above). Another source of the data is our fieldwork and corpora; these are data on morphology and syntax of the languages in the area. We also collect wordlists from the same languages, both to establish mutual distances between the languages and to determine the intensity of language contact. Finally, the main source of empirical data on multilingualism are oral interviews with highlanders.


During 2015, the following outcomes were obtained.

We have carried out a research of diachronic change in the strategies of infinitival complementation in Udi languages (from Caucasian Albanian, the ancestral language, to the modern dialects). We have considered three matrix predicates, ‘know, be able’, ‘start’ and ‘want’ – together with subordination strategies they allow. The research is based on the textual data available for these languages. Diachronically, these data show that both morphosyntactic and lexical changes in these constructions comply with the Complement Deranking Hierarchy (Cristofaro 2003).

We have analyzed the semantics of the locative marker -q- in Andi, Akhvakh, Godoberi and Chamalal. The meaning of this morpheme is usually defined as Apud. However, the examples of the uses given by the authors suggest that this does not reflect the true or the full range of the uses of the morpheme. Grammars do not provide evidence sufficient to suggest an alternative analysis. For Akhvakh, Andi and Godoberi dictionary have been used as sources. For Andi, we have also used the data collected during the fieldtrip of 2015 to the villages of Zilo and Rikwani, Botlikh district of the Republic of Daghestan. Basing on these data, we provide an analysis of the semantics of this localization suggesting that its spatial uses are transitional from Apud to prolative.

Also during the fieldtrip, we have collected preliminary data on the uses of the existential particle ži in the Rikwani dialect of Andi. Unlike the morphology of the Andic languages, their syntax has been poorly studied (the exceptions are Akhvakh, with a series of articles by Denis Creissels, and Bagvalal, described in detail by Aleksandr Kibrik and his team, as well as Godoberi, closely related to Andi and described by the same authors). Our data were collected through elicitation as based on typological studies of the copular constructions in the languages of the world.

We have collected data on and described the lexical domain of temperature in modern Eastern Armenian. The research was made within the methodological framework of lexical typology and based on corpus data. Eastern Armenian has adjectival terms for tactile temperature, verbal terms for personal temperature and nominal terms for ambient temperature. As shown in the study, this part-of-speech distribution has non-trivial correlates in derivational morphology. Although the hot domain is more lexically diversified, with a long list of non-derived nouns and adjectives and a special class of ‘fire temperatures’ (in addition to the better attested crosslinguistically tactile and personal and ambient temperatures), corpus statistics (in www.eanc.net) indicate that Eastern Armenian has a neatly articulated triangular system with one lexical item dominating the hot domain, both tactile and ambient, while it is in the cold domain that it has two almost equally frequent lexical items more or less specialized in tactile vs. ambient domain. A brief survey of temperature terms shows that only few of the words currently used for temperature have robust Indo-European etymologies. New words, often of unknown descent, appear in the language, some of them ousting the previously dominant lexical item for ‘hot’ into metaphorical domain.

A vast amount of audiodata, coming from the villages of Mehweb, Tsukhta, Balkhar and Rikwani, has been processed and annotated. The files have been annotated in Praat using Praat scripting language. The final objective is building a phonetic database of East Caucasian stops. A systematic instrumental acoustic analysis of the sound inventory of East Caucasian languages, one of the richest in the world, has never been done before. With some exceptions, the detailed analysis of these inventories in (Kibrik, Kodzasov 1990) is mostly based on perceptual analysis. The results and the design of the research have been discussed with Ian Maddieson, an outstanding phonetician, professor of the universities of Berkeley and Albuquerque, during his visit to HSE this Fall. We are discussing possible ways of his further participation in the project.

We have studied multilingualism in three clusters of villages in highland Daghestan. The first cluster includes the villages of Shangoda, Uri and Mukar, whose inhabitants speak Avar and Lak. The second cluster includes the villages of Rikwani, Zilo and Kizhani, whose inhabitants speak Andi and Avar; historically, Chechen was also present here. The third cluster includes Quli, Balxar, Culikana and Shukty in Akusha district. Shukta speaks Dargwa, while the other three villages speak Lak as the first language. Multilingualism in Daghestan has not been made subject to quantitative analysis outside the research done by Nina Dobrushina.

Daghestanian have a very clear idea of who belongs to which patrilineal clan, even when they are unaware of the exact family relationship between each clan member. We conducted a study aiming at a comparison of this social knowledge with the real genetic affiliation as seen through the data in our genetic samples. We considered the minimal number of evolutionary events necessary for transition between the observed sequences of loci in individuals as genetic distances between individuals and compared these distances pair-wise within clans, between clans and between villages. Statistically, the difference between the first set of pairs (within clans), on the one hand, and the second and third set of pairs (between clans and between villages), on the other, is highly significant and clearly supports the social idea of affiliation of individuals with patrilineal clans.


Valency classes in the World's languages / Ed. by B. Comrie, A. Malchukov. Berlin : De Gruyter Mouton, 2015.
Daniel M., Khurshudian V. Valency Classes in Eastern Armenian, in: Valency classes in the World's languages / Ed. by B. Comrie, A. Malchukov. Berlin : De Gruyter Mouton, 2015. P. 483-540.
Ganenkov D. Infinitival Complementation from Caucasian Albanian to Modern Udi // Journal of Historical Linguistics. 2015. Vol. 5. No. 1. P. 110-138. doi
Чечуро И. Ю. Семантика локализации -q- в андийских языках // В кн.: Сборник научных статей по материалам Четвертой конференции-школы «Проблемы языка: взгляд молодых ученых». [б.и.], 2015. С. 297-311.