• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
For visually-impairedUser profile (HSE staff only)SearchMenu

Recognition of G-quadruplexes by Methods of Deep Learning in the Saccharomyces cerevisiae Genome

Student: Irina Balaban-irmenina

Supervisor: Maria Poptsova

Faculty: Faculty of Computer Science

Educational Programme: Applied Mathematics and Information Science (Bachelor)

Final Grade: 9

Year of Graduation: 2020

G-quadruplexes are secondary structures of nucleic acids. These structures are found in the genomes of various species. Studies show that G-quadruplexes play an important role in the regulation of key cellular processes: transcription, translation, and replication. This work is devoted to the application of deep learning methods for the recognition of G-quadruplexes in the genome of Saccharomyces cerevisiae. Machine learning methods and deep learning methods for recognizing G-quadruplexes such as CNN and RNN were previously considered, and they showed that they are convenient and productive tools for the detection of secondary structures of DNA and RNA. In this work, the task was to test architecture, designed to solve NLP problems, because nucleotide sequences can be used in the form of sentences of a natural language. Models of the “transformers” type were chosen, which are currently prevailing in solving NLP problems. I managed to train four types of models: “FlauBERT”, “CamemBERT”, “RoBERTa” and “XLNet”, of which “CamemBERT” showed the best performance. The work showed the possibility of using deep learning models based on “transformer” architectures for G-quadruplex recognition problems. For this task "transformers" showed results comparable to CNN and RNN.

Full text (added May 20, 2020)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses