• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Style-transfer Autoencoder for Efficient Deep Voice Conversation

Student: Zuenko Denis

Supervisor: Ilya Makarov

Faculty: Faculty of Computer Science

Educational Programme: Statistical Learning Theory (Master)

Final Grade: 7

Year of Graduation: 2020

To make voice cloning, which is desirable in many film-related industries, we decided to take as a basis the model of AutoVC, which is state-of-the-art in the task of voice conversion. Although auto-encoders are not very popular solutions for this task, we decided that speed and storage of computing power for a task, such as a vocoder, is a priority nowadays. Therefore, we investigated the replacement of LSTM with convolutional layers while maintaining the quality of the original model. Despite this, GANs still seems to be a right but difficult decision, because it is hard to train. To explore the capabilities of AutoVC, we expanded the dataset with more noisy data. They cleaned it well and applied it to our implementation. As in the original, we first bring the data to presentation in the form of Mel-spectrograms, after which we train the models. This approach is accessible and useful, but in our opinion, adds extra complexity. So, authors of AvtoVTS or other models using this approach to data use vocoders after their models. For example, a WaveNet, in which to convert one voice it may take much more time than the original wave track. The result of our work showed that replacing LSTM with convolutional layers improves speed indicators, and this is especially noticeable on longer voice tracks because, with an increase in the size of the track, LSTM requires more operations. And also, LSTM slows down learning a bit. With our model, learning, even on a complex dataset, is faster. The result is improved training and speed, with the tiniest deterioration in sound quality, as evidenced by the reconstitution loss and MSD.

Full text (added May 25, 2020)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses