The goal includes two major directions of research: documentation and description. The documentary part of the project includes making a massive amount of audiorecording in the Tsekob dialect; transcription, translation into Russian, and morphological analysis of the recorded texts; development and launch of a morphologically annotated online corpus of Tsekob, lexical documentation. The descriptive part of the project involves the following activities: elicitation from native speakers for grammatical purposes (traditional topics of grammatical description), an in-depth investigation of a number of grammatical topics relevant in a theoretical or cross-linguistic perspective (valency, agreement, infinitival constructions, reported speech constructions etc.); publication of a grammatical description of Tsekob Akhvakh.
The secondary goals of the present project include the development of a morphologically annotated online corpus of Chirag Dargwa and theoretical analysis of a number of morphosyntactic phenomena in Nakh-Daghestanian languages.
The present project is cast in terms of the current methodology of linguistic documentation and description, and conforms to the standards accepted within the ELDP (Endangered Languages Documentation Program; SOAS, University of London). Within the project, a massive amount of spontaneous and semi-spontaneous (retelling of video clips) texts of different genres is being recorded (dialogues as well as narratives of various kinds, such as fairy tales, legends, life stories, everyday life, procedural texts about wedding, funerals, receipts) from men and women of different age (mainly middle-aged and older speakers). The recording is made using a ZoomH4n recorder with an external or built-in microphone depending on specific conditions on the field site. Technical specifications: uncompressed .wav format, 44,1 kHz, 16 bit. Metadata for the recorded texts are stored in the IMDI format. The texts are transcribed and translated in the field, using the ELAN software. The ELAN data is then converted to the FLEx format for further processing (cleaning up transcripts, semi-automatic morphological analysis, editing translation). The corpus will then be converted to the format of the East Armenian National Corpus, using a Python script, and launched online.
The Tsekob dialect of Southern Akhvakh is an endangered language of the Andic branch of Nakh-Daghestanian. Originally spoken in the village of Tsekob (Shamilskij district, Republic of Daghestan), the dialect has never been a lect with a large number of speakers. Since the 1990s, a massive migration from the village to the lowlands has begun (primarily to Makhachkala and Kaspiysk), resulting in less than one hundred speakers remaining in the original location, all in their fifties or older. Young people leave the village right after their graduation from high school, which means that the majority of Tsekob speakers live in a multi-ethnic environment where Russian is used as lingua franca of everyday communication. The transmission of the language to children is severely compromised.
The corpus of Chirag Dargwa as well as theoretical analysis of morphosyntactic phenomena are done using empirical data collected by the members of the lab earlier.
Results of research
Within the project, working contact has been established with two native speakers of Tsekob living in Makhachkala and willing to participate in a project documenting their native language. Preliminary sociolinguistic study has established that the dialect is even more endangered than estimated before the start of the project: only about 80 still live in the village whereas all others have moved to Makhachkala where children do not acquire the language anymore. We have recorded a number of oral texts of different genres, totaling about five hours recorded; we are working with the consultants on their transcription and translation. We have also collected a word list of about 800 lexemes (nouns, verbs, adjectives, adverbs) with relevant morphological information.
The corpus of Chirag Dargwa has been updated to include ca. 100,000 tokens more of glossed and translated texts. A significant number of verbal forms have been assigned a morphological analysis which will allow further semi-automatic analysis of these forms in the future.
We have written a grammatical sketch of the northern dialect of Tabasaran (Lezgic, Nakh-Daghestanian) as spoken in the village of Djuli (Tabasaranskij district, Republic of Daghestan).
We have investigated a sample of Nakh-Daghestanian languages with respect to one of the hottest topics in modern theoretical morphology: constraints on contextual allomorphy, and have found several counterexamples to a recent hypothesis that seeks to restrict contextual allomorphy and suppletion in terms of “structural containment.”
We have also studied allomorphy in Aqusha Dargwa and have shown that the choice of the allomorph in verbal roots is done depending on the values for aspect and tense added later in the derivation, that is, non-locally, since the root is separated from tense and aspect morphemes by the causative suffix. This is a typologically unusual phenomenon that presents a problem for theoretical accounts of contextual allomorphy within Distributed Morphology. We have proposed a theoretical analysis of this violation: we suggest that verbal roots in Aqusha lack the elsewhere allomorph and cannot thus receive exponence within their locality domain. Instead, the Vocabulary Insertion of the root must be delayed till the moment in the derivation where information about aspect and tense becomes available.
Level of implementation, recommendations on implementation or outcomes of the implementation of the results
After the work on the corpus of Chirag Dargwa has been finished, the corpus will be put online with free access (the launch is planned for 2019).