Automatic syllable structure extracting from dictionaries: Slavic data

Student: Romanova Ksenia

Educational Programme: Fundamental and Computational Linguistics (Bachelor)

Year of Graduation: 2019

Many research are generally focused on syllabification and developments of syllabification algorithms or some special feature of syllable structure (clusters complexity, syllables and rhymes), but the syllable structure is not the main subject of interests. Previous studies of syllable structure of Slavic languages have never been focused on describing of the syllable structure of the languages from the perspective of semi-automatic analysis. Therefore, the purpose of the work is the description of the syllable structures of Macedonian, Polish and Russian languages, the application and improvement of the method described by G. Moroz (2019). Moreover, this kind of analysis is aimed to be applied on any sets of words not to miss any rare clusters or features. The analysis of the structure of absolute initials and absolute finals of words allows to analyze the syllable structure in languages with no strict syllabification rules. Macedonian, Polish and Russian are Slavic languages, Indo-European language family, of South, West and East Slavic branches respectively. Considered languages have complex syllable structure, in these languages complex coda and onset clusters are observed, they can be consonant+consonant or more than two consonants. Consider Russian monosyllabic word spat’ and plast with obstruent + obstruent in onset and coda respectively. I already received some preliminary results for the Macedonian language and it causes more questions: these are features of this certain language. Maybe it is a result of language convergence and this is areal or genetic feature. Also, these languages variable with regards to syllabification, so arriving at a particular analysis of their syllable structure is problematic. I want to find and use the method which would be suitable for that. The study provides information on the frequency of consonants and clusters distribution. Vowels are not taken into account except for syllable nucleus designation. In my research I mostly explore consonant clusters and its consistency. In this work, I apply and specify the method of semi-automatic calculation of syllable structure and syllabic codas onsets complexity. The paper is intended to find and demonstrate the algorithm, which will be suitable for different kinds of data.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses