# Curvature Effect of Sample-Based Manifold Intrinsic Dimension Estimation

Student: Oleg Bulanov

Supervisor:

Faculty: Faculty of Computer Science

Educational Programme: Data Science (Master)

Machine learning and data analysis methods quite frequently have to deal with high dimensional data. Analysis and preprocessing of such data need huge computation resources and reduce prediction quality of the methods. However, data features are frequently correlated. Thus it is possible to find low dimensional representation of data without huge information loss. Dimensionality reduction methods deal with this problem. The methods often need a dimension of the representation as an input parameter. This leads to a problem of estimation of this parameter. There is a common assumption that data lies near a manifold of low dimension. The dimension is called intrinsic dimension. There are couple of methods that deals with estimation of the parameter but results are often not perfect. The paper focuses on maximum likelihood estimation of the intrinsic dimenimension method. The method tends to lower dimension that is why we consider methods assumptions more closely. In particular, as soon as data lies near a manifold it is reasonable to investigate dependence of the estimation from geometrical properties of the manifold. The main result of the paper is that such dependence was found and some numerical experiments confirm the result.

