Year of Graduation
Genome Annotation by Functional Elements by Methods of Supervised Learning
Applied Mathematics and Information Science
This paper is devoted to the application of machine learning methods to G-quadruplex recognition. Studies show that G-quadruplexes play an important role in the regulation of gene expression and can also influence the development of severe diseases, so the task of their recognition is relevant. This paper describes the construction of two neural network models predicting the formation of G-quadruplexes. The first model uses a convolution neural network to predict the presence of G-quadruplexes in a DNA fragment of fixed length 500. The second model, with a more complex architecture that uses bidirectional LTSM block and convolution layers, specifies the boundaries inside the fragment, where the first network detected the presence of a G-quadruplex. This two-level model allows for the effective annotation of long DNA sequences. As a result, it was possible to construct a model that significantly outperforms the existing pattern-based methods.