Goals of the project
Within the project, we do language documentation and study language structures, sociolinguistic situation and genetic diversity of various ethnic groups in the Northern Caucasus.
This includes collecting and analysis of genetic samples from traditional settlement areas of different ethnic groups in Daghestan (controlling patrilineal relatedness); collecting sociolinguistic data (questionnaires on multilingualism); documentation of minority languages and field and corpus research of grammar.
The project employs linguistic, sociolinguistic and genetic methods of data analysis. Linguistic analysis used the methods of comparative and synchronic analysis of grammar. Acoustic data were studied by methods of instrumental acoustic analysis, primarily by means of the free annotation-and-analysis program Praat. The study of the sociolinguistic situation in Daghestan uses sociolinguistic questionnaires on multilingualism (Dobrushina 2013). We interview our consultants to reconstruct the traditional patterns of multilingualism. Genetic analysis targets patrilineal kinship based on genetic data from the loci of the Y-chromosome (a panel of micro-satellites of Y-chromosome is used to establish family relatedness through male line).
Empirical data for linguistic analysis proper comes from the texts in the languages of Daghestan. These constitute two groups: recordings of interviews and oral stories recorded in minority languages or dialects (e.g. Rutul, Andi) and written texts in major languages of Daghestan openly available from the internet. Marginally, we also used texts in minority languages from previously published academic sources. The study of the grammar is also done by interviewing consultants (including translating sentence stimuli). The study of the phonetics involve direct perceptive analysis while interviewing the consultants as well as perceptive and instrumental analysis of the data recorded from them. The source of empirical data on multilingualism are oral interviews with highlanders. The basis for genetic studies is constituted by genetic material obtained from male donors; as patrilineal kinship is targeted, we only consider loci on the Y-chromosome.
The research gives various types of outcomes, including corpora of minority and larger languages; phonetic databases of stops across Daghestanian languages; description of various elements of East Caucasian (Nakh-Daghestanian) morphology and syntax; quantitative data on traditional multilingualism and language contacts; data on patrilineal kinship across various ethnic groups in Daghestan.
The following results were obtained in 2016, more specifically:
We have processed audio-records of the Kina dialect of Rutul (Lezgic) and Rikwani dialect of Andi (Andic). We have annotated various phases of stops, targeting primarily voiceless and ejectives. We have converted to the www.eanc.net format and made available online legacy corpora in Bagvalal, Tsakhur, Chirag, Godoberi, Sanzhi Dargwa and Aghul languages (web-corpora.net). The texts are provided with morphological glossing. Preliminary, we have processed texts in the Rikwani dialect of Andi and Mehweb Dargwa; texts were exported from Praat to Elan to Fieldworks for further glossing. Godoberi legacy texts were OCR-ed and formatted (provided with word-by-word translation). We have processed and converted to UniParser format wordlists for Dargwa and Lezgian (to improve the Dargwa and Lezgian corpora). A list of lexical glosses in English has been compiled for lexical items frequent in Eastern Armenian National Corpus (to provide www.eanc.net with glossed output). Some technical resources were created to process online language sources in several languages of Daghestan (Avar, Standard Dargwa, Chechen). A package of dynamic map plotting was created to be used in the typological atlas of the East Caucasian languages, additional geo-location information was integrated for Chechen and Ingush, and about ten maps became available online (the pilot version of the atlas is available at dagatlas.hol.es). We have edited and submitted to Language Science Press a collection of descriptive papers in English on the grammar of Mehweb (the project contributed collecting and editing the volume; project participants contributed two papers, one of which has also been published in HSE preprints). A study of the binominative construction in Lak was completed (published as another HSE preprint). New genetic data were obtained from the DNA samples collected in Daghestan; the set of loci was increased and data from other populations were included (Aghuls and Chirags).