Year of Graduation
Regulation of Nucleosome Positioning by DNA Secondary Structures
Data Analysis for Biology and Medicine
Besides canonical right-handed double helix structure (B-DNA) proposed by Watson and Crick in 1953 DNA may fold into other structures. Unusual conformations might be involved in main cellular processes including transcription. One of the ways of how transcription can be guided is nucleosome positioning. It is known that nucleosome - a structural unit in chromatin - forms only with B-DNA and unusual DNA may change nucleosome profile. To assess the association between chromatin and non-canonical DNA structures the positioning of nucleosomes and unusual DNA (Z-DNA, H-DNA, G-quadruplexes, SIDD-sites) is studied in this work. Three types of interaction are found. Machine learning models were trained by two techniques (random forest classifier and gradient boosting classifier) to predict these types of association based on the sequence composition (statistics of dinucleotides and triplexes). Overall accuracy for G-quadruplexes is found to be 94% while for other types it accounts for 60%. This shows that the sequence composition is enough to predict the association between G-quadruplexes and nucleosomes while for other structures sequence composition is a poor predictor. Further investigation of physical and chemical properties of sequences might improve prediction power.