Algorithms and Systems for Sound Recognition

Student: Shishkin Svyatoslav

Educational Programme: Data Science (Master)

Year of Graduation: 2019

This work is devoted to diarization (according to another division of speakers), that is, the process of determining by audio the number of people present on it, as well as determining the authorship of individual segments with speech within a selected group. The work is organized as follows: the theoretical part includes an overview of the basic elements of a modern typical diarization pipeline, including the extraction of low-level acoustic features from audio, the detection of speech, the selection of high-level acoustic features and direct diarization. In the second part, the formal goal setting is carried out, the target metric is determined, and then the approach we use to solve the diarization problem is described. The result of this work is a system capable of performing speaker separation with accuracy comparable to SOTA systems.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses