Year of Graduation
Transposon Recognition by Machine-Learning Methods
Big Data Systems
The role of 3’ UTR stem-loops secondary structures in retrotransposition was experimentally shown for mobile genetic elements of various species, where LINE and SINE retrotransposons share the same 3’ UTR sequences, containing a stem-loop. The properties of 3’-end stem-loops of human L1s, Alus, were investigated. They do not match in terms of sequences, but all have 3’ UTR stem-loops. Two types of machine-learning models have been built: a sequence-based and a structure-based in order to recognize 3’-end L1 and Alu, stem-loops with high accuracy. The sequence-based models consider only sequence statistics information and capture compositional bias in 3’-ends. The structure-based models take into account chemical, physical and geometrical characteristics of dinucleotides in a stem and position-specific nucleotide features of a loop and a bulge. The most significant parameters include shift, rise, tilt, and hydrophilicity. Obtained results point to the existence of some structural constrains for 3’ UTR stem-loops of L1 and Alu, which are probably required for transposition.