• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Methods of Statistical Dimention Reduction

Student: Tagarova Be`lla

Supervisor: Vladimir Panov

Faculty: School of Statistics, Data Analysis and Demography

Educational Programme: Bachelor

Year of Graduation: 2014

<p>Humanity today generates more than 1,000 exabytes of data per year. This fact leads us to a dramatic development of technologies for data collection, as well as the ability to analyze the available information. So today the study of various phenomena involves working with high-dimensional data, which leads us to two major problems: first, to obtain reliable evaluation of the model with high-dimensional data requires large sample collection which requires more time and expense , and secondly it is impractical expensive to analyze large data sets. Technologies that are currently being introduced in Russia and abroad (such as BIG DATA), can solve the second problem, but finding the optimal solution to the first problem is still an open question.</p><p>There are many areas where the analysis of high-dimensional data is relevant at the moment - genomics, environmental science, business intelligence, analytics and social media.</p><p>As an example, consider one of the above areas - genomics. Genomics - Molecular Genetics section devoted to the study of the genome and genes of living organisms. Amount of information contained in all the DNA molecules of one man more than 100 times more information that is generated by all of humanity in one year. Therefore, before geneticists important question on minimizing dimension data before analysis of the problem being studied .</p><p>All the statements above determine the relevance and practical importance of studying the techniques of data dimension reduction.</p><p>The object of research is the problem of data dimension reduction.</p><p>Subject of research are three techniques of data dimension reduction:</p><p style="margin-left:49.65pt;">&bull; Minimum average variance estimation method (MAVE- method)</p><p style="margin-left:49.65pt;">&bull; Outer product of gradients estimation method (OPG- method)</p><p style="margin-left:49.65pt;">&bull; Inversed minimum average variance estimation method (iMAVE- method)</p><p>The aim is to conduct a comparative analysis of statistical methods for dimension reduction of data.</p><p>According to an investigation set the following tasks:</p><p style="margin-left:49.65pt;">&bull; To examine the current techniques for reducing the dimension of data and the limitations that are imposed on the model to prove the effectiveness of the proposed approaches.</p><p style="margin-left:49.65pt;">&bull; To make a comparative analysis of methods, using comparison of the rates of convergence of methods.</p><p style="margin-left:49.65pt;">&bull; To conduct a simulation to compare the performance of methods.</p><p>The first chapter of this work contains the form of regression models and a theoretical justification of the selected dimension reduction techniques. The second chapter considers the algorithms of dimension reduction techniques of data. The third chapter describes the results of simulations for all the models considered in this study.</p>

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses