• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Finding Patterns in Epigenomics Data

Student: Ibragimova Diana

Supervisor: Maria Poptsova

Faculty: Graduate School of Business

Educational Programme: Big Data Systems (Master)

Year of Graduation: 2018

Nowadays the volumes of data dramatically increase and people have to deal with problems of handling them in different spheres. Bioinformatics is one of these fields where volumes of information is huge and scientists need to implement new techniques for processing such amounts of data. One of international consortiums is the Roadmap Epigenomics project that stores Next Generation Sequencing experiments from different tissues. Currently around 3000 experiments were made available for the research community for three major classes of tissues – adult organs, fetal organs and embryonic stem cells. There is another layer of genomic annotation that requires elucidation. It is the genome annotation with secondary structures. Together with epigenetic layers, they make up patterns of genome regulation. The objectives of the present study is to find patterns in epigenomic data associated with DNA secondary structures in order to reveal hidden regularities. The task of finding patterns in the unknown data is a general task. In the frame of the presented Master’s Thesis we will explore different machine-learning algorithms, both supervised and unsupervised, in applications to genomic big data. These type of analysis presents both the scientific value and also has practical applications in the field of personalized medicine. Patient-specific mutations and patterns of epigenomic markers should be mapped to the existing genomic annotations. One of the important tasks is to classify the new genomic data to the existing classes (task of supervised learning) and to define group of patients in unknown data (unsupervised learning). Also in the sphere of personalized medicine the developed programs for research and analysis of a huge amount of genomic data can be integrated as parts of a whole analytical system for classification of particular patient by his genome characteristics and making personalized suggestions.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses