• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Modelling of Co-expression Gene Networks and Splicing Networks by Glasso Method

Student: Didkovskaya Natalia

Supervisor: Dmitry Davidovich Pervouchine

Faculty: Faculty of Computer Science

Educational Programme: Data Analysis for Biology and Medicine (Master)

Year of Graduation: 2018

Gene expression data usually comes in large panels of RNA-seq experiments, in which gene activity is measured across the genome in a variety of conditions. The relationships between genes can be inferred for the groups of genes that show similar expression profiles and can be summarized as a gene co-expression network. In this network, the nodes are genes and the edges are the associations between gene expression levels. A properly constructed co-expression gene network can help understand the transcriptional regulatory system, the pathways and mechanisms behind complex biological processes. On the other hand, co-expression gene networks provide a concerted view of gene activity from their co-occurrence, and thus the connections between genes are not physical, as they are, for instance, in protein-protein interaction networks. A problem that is inherent to the construction of all co-expression networks is overfitting. The number of protein-coding genes in the human genome is currently estimated as 20 000, while the number of samples in a typical panel of RNA-seq experiments is in the order of hundreds. To address this problem, the glasso approach is applied in this work. In this method, gene co-expression is modelled as a graph derived from the inverse covariance matrix of a multivariate normal distribution. First, we use simulated samples of different size and graphs from different background distributions to assess the performance of glasso. Next, we assess the accuracy of glasso for bipartite graphs, which model splicing networks. Finally, we apply glasso for the construction of co-expression and splicing networks using RNA-seq data from Genome Tissue Expression Project consortium (GTEx) and identify hub nodes, i.e., splicing factors that have the most connections with exons, including RPS13, YBX1, DDX17, DDX5, HSPA8, NPM1, SNRNP70, HNRNPA2B1, and PABPC1. This suggests important regulatory function for these factors.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses