• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2019/2020

Введение в Data Science

Направление: 38.04.02. Менеджмент
Когда читается: 1-й курс, 2 модуль
Формат изучения: Full time
Прогр. обучения: Стратегии развития бизнеса: управление и консалтинг
Язык: английский
Кредиты: 4

Программа дисциплины


The course "Introduction to data science" provides the basics of data analysis, statistics, digital signal processing, machine learning through a series of lectures and practice work based on MS Excel and MS Azure tools.
Цель освоения дисциплины

Цель освоения дисциплины

  • The purpose of the discipline of introduction to data science: teaching the basics of working with data.
  • The purpose of the discipline of introduction to statistics
  • The purpose of the discipline of introduction to digital signal processing
  • The purpose of the discipline of introduction to machine learning
  • The purpose of the discipline of introduction to MS Excel and MS Azure
  • The development of critical thinking
Результаты освоения дисциплины

Результаты освоения дисциплины

  • Be able to calculate 1) mode, median, average 2) Dispersion, standard deviation
  • Be able to calculate confidence intervals
  • Be able to test hypotheses using statistical criteria
  • Be able to calculate correlation
  • Be able to train machine learning algorithms for the task of classification, clustering and regression
  • Be able to choose a metric for checking the quality of the algorithm
Содержание учебной дисциплины

Содержание учебной дисциплины

  • Introduction
    1.1 Introducing the teacher and course 1.2 Motivating part about data analysis and ML 1) Introduction to the course (what we will study, how to evaluate) 2) Summary of course items 3) Introductory information on data analysis, examples of use from the industry. 4) Demonstration of data analysis on a "not obvious" statistical example (you can take the example of "weight loss" and statistical significance 5) A few examples where the lack of competent analysis led to adverse consequences (you can tell the story of Bill Gates financing small schools and show the effect of regression to medium on the example of the “heads-tails” experiment)
  • Statistics. Distribution.
    2.1 Probability and distribution 2.2 Distribution parameters (mode, median, mean, excess, asymmetry, range, variance, standard deviation) Introductory information on the basics of statistics (distributions, histogram parameters (difference in median, mode and arithmetic mean), significance level, variance, confidence intervals. Examples of analysis of statistics on applied industry problems (distribution of viewers' income).
  • Statistics. Confidence intervals and hypothesis testing.
    3.1 The concept of confidence interval 3.2 Calculation of the confidence interval and examples 3.3 Testing hypotheses using a confidence interval 3.4 Calculation of 3 sigma with examples
  • Hypothesis testing
    4.1 Significance criteria 4.2 Significance Level 4.3 Student, Criterion, Chi Square, Man Whitney 4.4 Testing hypotheses using criteria 4.5 Calculation of significance level. Verification of the hypothesis for p significance level.
  • Correlation and other data processing methods
    5.1 Correlation, autocorrelation 5.2 Spectral region 5.3 Time Series Filters 5.4 Fractals, wavelets, convolution
  • Machine learning basics
    6.1 What is ML and where is it used 6.2 The tasks of classification, regression, clustering, ranking and forecasting time series 6.3 Basic ML algorithms 6.4 Five Historical Paradigms of ML Development Introductory information about the basics of machine learning (what it is, the task of classification, regression, clustering), the main problems and metrics. Examples of the use of machine learning on applied industry problems (recommendation systems). 7.1 Data Types 7.2 Data preprocessing 7.3 Retraining and regularization 7.4 Quality metrics (completeness, accuracy, f1 measure, roc-auc, confusion matrix) 7.5 Which algorithms are better suited for which tasks
  • Data Basics
    8.1 What data can be trusted? 8.2 Basic methods of data manipulation. 8.3 Cognitive bias in data interpretation. We consider the basic techniques of data manipulation (see the books "Statistics and Seals" and "How to Lie Using Statistics") Interactive game: find the manipulation (examples from the books above) Interactive game: “deceive a friend” (task, conduct an experiment in a public opinion poll (you can choose any topic, for example, how many hours a day people watch TV) so as to mislead others) 4) Verification of the additional task (find a data set from the industry and either calculate the reliability of the hypothesis or build a predictive algorithm). Earning extra points for a task 5) Summing up, rating
Элементы контроля

Элементы контроля

  • Компьютерное тестирование (неблокирующий)
  • Домашнее задание 1 (неблокирующий)
  • Домашнее задание 2 (неблокирующий)
  • Домашнее задание 3 (неблокирующий)
  • Домашнее задание 4 (неблокирующий)
  • Домашнее задание 5 (неблокирующий)
  • Домашнее задание 6 (неблокирующий)
  • Домашнее задание 7 (неблокирующий)
  • Домашнее задание 8 (неблокирующий)
  • Домашнее задание 9 (неблокирующий)
  • Домашнее задание 10 (неблокирующий)
  • Домашнее задание 11 (неблокирующий)
  • Контрольная работа (неблокирующий)
  • Индивидуальная исследовательская работа (неблокирующий)
Промежуточная аттестация

Промежуточная аттестация

  • Промежуточная аттестация (2 модуль)
    0.06 * Домашнее задание 1 + 0.05 * Домашнее задание 10 + 0.05 * Домашнее задание 11 + 0.06 * Домашнее задание 2 + 0.06 * Домашнее задание 3 + 0.06 * Домашнее задание 4 + 0.06 * Домашнее задание 5 + 0.06 * Домашнее задание 6 + 0.06 * Домашнее задание 7 + 0.06 * Домашнее задание 8 + 0.06 * Домашнее задание 9 + 0.2 * Индивидуальная исследовательская работа + 0.06 * Компьютерное тестирование + 0.1 * Контрольная работа
Список литературы

Список литературы

Рекомендуемая основная литература

  • Gordon S. Linoff and Michael J.A. Berry. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Third Edition. John Wiley & Sons, 2011 (888 pages). ISBN: 9780470650936: Текст электронный // ЭБС books24x7 — https://library.books24x7.com/toc.aspx?bookid=40629
  • Miroslav Kubat. An Introduction to Machine Learning. Springer, 2015 (296 pages) ISBN: 9783319200095: — Текст электронны // ЭБС books24x7 — https://library.books24x7.com/toc.aspx?bookid=117295
  • Дарелл Хафф. Как лгать при помощи статистики = How to Lie with Statistics. — М.: Альпина Паблишер, 2015. — 163 с. — ISBN 978-5-9614-5212-9. http://lib.alpinadigital.ru/ru/library/book/5573

Рекомендуемая дополнительная литература

  • Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: Data Mining, Inference, and Prediction. – Springer, 2009. – 745 pp.
  • Mohammed, Mohssen Khan, Muhammad Badruddin Bashier, Eihab Bashier Mohammed. Machine Learning: Algorithms and Applications. Auerbach Publications © 2017 // https://library.books24x7.com/toc.aspx?bookid=117434