• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2019/2020

Непараметрическая теория и методы анализа данных

Направление: 01.04.02. Прикладная математика и информатика
Когда читается: 2-й курс, 3 модуль
Формат изучения: Full time
Прогр. обучения: Прикладная статистика с методами сетевого анализа
Язык: английский
Кредиты: 4

Course Syllabus

Abstract

This course handles various methods of solving popular statistical tasks like probability density estimation, describing the dependence structures via regression models, and providing statistical tests. All methods considered in this course require only few assumptions about the probabilistic properties of the model from which the data were obtained. For instance, they forgo the assumption that the original distribution is normal. In this course, we show the implementation of considered appoaches in statistical software (preferably in the R-language), and demonstrate how these methods can be used for the solution of some real-world problems.
Learning Objectives

Learning Objectives

  • The course gives students an important foundation to develop and conduct their own research as well as to evaluate research of others.
Expected Learning Outcomes

Expected Learning Outcomes

  • Know the theoretical foundation of nonparametric analysis
  • Be able to state the problem of the probability density estimation and estimate the distribution function
  • Have an understanding of the basic principles of nonparametric methods
  • Know modern extensions to applied statistical analysis
  • Be able to work with major statistical programs, especially R, so that they can use them and interpret their output
  • Have the skill to meaningfully develop an appropriate model for the research question
  • Know the basic principles behind working with all types of data for building nonparametric models
  • Have the skill to work with statistical software, required to analyze the data
  • Be able to develop and/or foster critical reviewing skills of published empirical research using applied statistical methods.
  • Be able to criticize constructively and determine existing issues with applied linear models in published work
Course Contents

Course Contents

  • Probability density estimation
    1. Statement of the problem. Estimation of the distribution function. 2. Histogram as a density estimate. Bias-variance tradeoff. General concept and particular results for the histogram. Bias-variance decomposition for histograms. Minimization of AMISE for histogram: Scott and Friedman-Diaconis rules for bandwidth selection. Other ideas for the choice of the amount of bins: Sturges rule. “Pretty” procedure in the R language. 3. Kernel density (Parzen-Rosenblatt) estimates. Bias-variance decomposition for kernel estimates. Minimization of AMISE for kernel estimates with respect to the kernel: Epanechnikov kernel. The notion of the kernel efficiency. Minimization of AMISE for kernel estimates with respect to the bandwidth: nrd and nrd0 options. (Unbiased) cross-validation for the probability density estimates. 4. Rates of convergence for histogram and kernel density estimates. Lower bounds for density estimates: van der Vaart’s theorem.
  • Nonparametric regression
    1. Statement of the problem. 2. Nearest neighboors algorithm, local averaging. Method “Super smoother”. Cross-validation approach and the bass parameter. 3. Local regression. Method “Loess” (“Lowess”). 4. Generalized cross-validation. Motivation of the algorithm: theorem about the closed form of the cross-validation error for linear regression. 5. Akaike criterion. 6. The Nadaraya-Watson kernel estimator. Modifications of this estimator (local polynomial estimator, Gasser-Muller estimator). 7. The notion of linear smoother. Regressogram.
  • Nonparametric tests
    1. Tests for independence I: Kendall’s tau. Unbiased estimate for Kendall’s tau. Exact distribution of this estimate for n=3. Large-sample approximation for the constructed estimate. Calculation of the mean and the variance. Construction of asymptotic confidence intervals. The notion of bootstrap. Relation between Kendall’s tau and the Pearson correlation coefficient. 2. Tests for independence II: Spearman’s rho. Equivalent form of the Spearman’s rho. Exact distribution of Spearman’s rho for n=3. Large-sample approximation for the constructed estimator. Calculation of the mean and the variance. 3. Paired replicates data. Wilcoxon test. Exact distribution for n=3. Large-sample approximation. Calculation of the mean and the variance. 4. 2 independent samples. Wilcoxon statistics and Mann-Whitney statistics. Mann-Whitney test. Exact distribution for n=3 and m=2. Large-sample approximation. Calculation of the mean and the variance. 5. Many independent samples. Kruskal-Wallis test. Relation to the ANOVA test. Exact distribution for k=3, n1=n2=n3=2. Large-sample approximation. 6. Two-way layout. Friedman’s test.
  • Bonus lectures
    1. Wavelets. Haar basis. The notion of resolution. Application of this idea to the regression problem. 2. Neural networks. Single-layer perceptron. Back-propagation algorithm.
Assessment Elements

Assessment Elements

  • non-blocking Сumulative mark for the work during the module
  • non-blocking Final test
Interim Assessment

Interim Assessment

  • Interim assessment (3 module)
    0.7 * Final test + 0.3 * Сumulative mark for the work during the module
Bibliography

Bibliography

Recommended Core Bibliography

  • Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: Data Mining, Inference, and Prediction. – Springer, 2009. – 745 pp.

Recommended Additional Bibliography

  • Лагутин М.Б. - Наглядная математическая статистика: учебное пособие - Издательство "Лаборатория знаний" (ранее "БИНОМ. Лаборатория знаний") - 2019 - ISBN: 978-5-00101-642-7 - Текст электронный // ЭБС Лань - URL: https://e.lanbook.com/book/116104