Year of Graduation
CNN Applications to Recognition of Genomic Sequences
Big Data Systems
Evolution of Big Data sphere and development of techniques and tools for its storage, processing and analyzing have given the rapid growth of data-driven approach among corporations and scientific community. Thus, biology and particularly genomics have not become an exclusion. Reducing costs and technical simplification of genome sequencing and availability of big DNA data processing methods have given a boost in bioinformatics studies. In this master thesis the problem of retrotransposition mechanism are observed. The data are ends of mRNAs, processed pseudogenes and transposons and dinucleotide shuffled sequences for comparison. The benchmark studies of classes similarities using convolutional neural networks – modern method of deep learning – are performed. The results show a great power of considered models and confirm key study hypotheses about similar nature of observed classes and common mechanism of retrotransposition. The methods developed in this thesis are scalable to the other genomic data. Applications to biotechology and biomedicine as a potential mean of DNA editing are discussed.