Year of Graduation
Spatial Statistical Analysis for Linguistic Data: Gender Systems in Nakh-Daghestanian Languages
Fundamental and Computational Linguistics
The present thesis applies spatial autocorrelation analysis on dialectological data from Nakh-Daghestanian languages. In particular, distribution of gender systems among language family will be investigated. The current research is based on the concept of reproducibility. Hence data was collected independently by two linguists and inter-rater agreement was measured. Despite the discrepancy in approaches, the intraclass correlation coefficient was 0.971. Spatial analysis is a new trend in linguistic geography that soon might become essential. Method of spatial autocorrelation analysis contains Moran’s I and Getis-Ord Gi tests that show global and local clustering of chosen variables, respectively. Positive result of Moran’s I that was received means that analyzed values are influenced by their neighbours. Getis-Ord Gi* measure detected low-value cluster in the South, where Lezgian and Aghul are situated. High-value cluster was found in the North where Chechen, Ingush, Andi and Chamalal are spoken. Observed clusters are not fully defined by phylogenetic affiliation. This phenomenon might be result of language contact. Moreover, the dataset of Nakh-Daghestanian settlements and a database of gender systems is available online (https://doi.org/10.5281/zenodo.1253012) for anyone who wants to reproduce the present research or make their own.