Непараметрическая статистика

Магистратура 2019/2020

Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»

Лучший по критерию «Новизна полученных знаний»

Статус: Курс по выбору (Статистическое моделирование и актуарные расчеты)

Направление: 38.04.01. Экономика

Кто читает: Департамент статистики и анализа данных

Где читается: Факультет экономических наук

Когда читается: 1-й курс, 3 модуль

Формат изучения: без онлайн-курса

Преподаватели: Панов Владимир Александрович

Прогр. обучения: Статистическое моделирование и актуарные расчеты

Язык: английский

Кредиты: 3

Контактные часы: 40

Full Syllabus

Abstract

This course handles various methods of solving popular statistical tasks like probability density estimation, describing the dependence structures via regression models, and providing statistical tests. All methods considered in this course require only few assumptions about the probabilistic properties of the model from which the data were obtained. For instance, they forgo the assumption that the original distribution is normal. In this course, we show the implementation of considered appoaches in statistical software (preferably in the R-language), and demonstrate how these methods can be used for the solution of some real-world problems.

Learning Objectives

1. Study of the basic concepts of nonparametric statistics. 2. Study mathematical background of the nonparametric statistical methods.

Expected Learning Outcomes

Understanding the methodology of the probability density estimation
Understanding the methods of nonparametric regression
Knowing the nonparametric tests for solving various statistical problems
Understanding the wavelet approach

Course Contents

Part I: Probability density estimation
1. Statement of the problem. Estimation of the distribution function. 2. Histogram as a density estimate. Bias-variance tradeoff. General concept and particular results for the histogram. Bias-variance decomposition for histograms. Minimization of AMISE for histogram: Scott and Friedman- Diaconis rules for bandwidth selection. Other ideas for the choice of the amount of bins: Sturges rule. “Pretty” procedure in the R language. 3. Kernel density (Parzen-Rosenblatt) estimates. Bias-variance decomposition for kernel estimates. Minimization of AMISE for kernel estimates with respect to the kernel: Epanechnikov kernel. The notion of the kernel efficiency. Minimization of AMISE for kernel estimates with respect to the bandwidth: nrd and nrd0 options. (Unbiased) cross-validation for the probability density estimates. 4. Rates of convergence for histogram and kernel density estimates. Lower bounds for density estimates: van der Vaart’s theorem.
Part II. Nonparametric regression
1. Statement of the problem. 2. The notion of linear smoother. Regressogram. 3. Nearest neighboors algorithm, local averaging. Method “Super smoother”. Cross-validation approach and the bass parameter. 4. Local regression. Method “Loess” (“Lowess”). 5. The Nadaraya-Watson kernel estimator. MISE for this estimator (without proof). 2 6. Nadaraya-Watson kernel estimator as a solution of the optimization problem and local polynomial estimate. Gasser-Muller estimate. 7. Generalized cross-validation. Motivation of the algorithm: theorem about the closed form of the cross-validation error for linear regression. 8. Akaike criterion. Kullback- Leibler discrepancy
Part III. Nonparametric tests
1. Tests for independence I: Kendall’s tau. Unbiased estimate for Kendall’s tau. Exact distribution of this estimate for n=3. Large-sample approximation for the constructed estimate. Calculation of the mean and the variance. Construction of asymptotic confidence intervals. The notion of bootstrap. Relation between Kendall’s tau and the Pearson correlation coefficient. 2. Tests for independence II: Spearman’s rho. Equivalent form of the Spearman’s rho. Exact distribution of Spearman’s rho for n=3. Large-sample approximation for the constructed estimator. Calculation of the mean and the variance. 3. Paired replicates data. Wilcoxon test. Exact distribution for n=3. Large-sample approximation. Calculation of the mean and the variance. 4. 2 independent samples. Wilcoxon statistics and Mann-Whitney statistics. Mann-Whitney test. Exact distribution for n=3 and m=2. Large-sample approximation. Calculation of the mean and the variance. 5. Many independent samples. Kruskal-Wallis test. Relation to the ANOVA test. Large-sample approximation (without proof). 6. Two-way layout. Friedman’s test (only general idea)
Part IV. Bonus lecture
Wavelets. Haar basis. The notion of resolution. Application of this idea to the regression problem

Assessment Elements

Home assignments
Final exam

Interim Assessment

Interim assessment (3 module)
0.7 * Final exam + 0.3 * Home assignments

Bibliography

Recommended Core Bibliography

Wasserman, L. All of nonparametric statistics. – Springer Science & Business Media, 2006. – 270 pp.

Recommended Additional Bibliography

Сдвижков О.А. - Непараметрическая статистика в MS Excel и VBA - Издательство "ДМК Пресс" - 2014 - 172с. - ISBN: 978-5-94074-917-2 - Текст электронный // ЭБС ЛАНЬ - URL: https://e.lanbook.com/book/58695

Course Syllabus