• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Master 2019/2020

Introduction to Data Science

Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'
Area of studies: Management
When: 1 year, 2 module
Mode of studies: offline
Instructors: Косолапов Кирилл Вадимович, Adel Valiullin
Master’s programme: Business Strategies: Management and Consulting
Language: English
ECTS credits: 4
Contact hours: 30

Course Syllabus

Abstract

The course "Introduction to data science" provides the basics of data analysis, statistics, digital signal processing, machine learning through a series of lectures and practice work based on MS Excel and MS Azure tools.
Learning Objectives

Learning Objectives

  • The purpose of the discipline of introduction to data science: teaching the basics of working with data.
  • The purpose of the discipline of introduction to statistics
  • The purpose of the discipline of introduction to digital signal processing
  • The purpose of the discipline of introduction to machine learning
  • The purpose of the discipline of introduction to MS Excel and MS Azure
  • The development of critical thinking
Expected Learning Outcomes

Expected Learning Outcomes

  • Be able to calculate 1) mode, median, average 2) Dispersion, standard deviation
  • Be able to calculate confidence intervals
  • Be able to test hypotheses using statistical criteria
  • Be able to calculate correlation
  • Be able to train machine learning algorithms for the task of classification, clustering and regression
  • Be able to choose a metric for checking the quality of the algorithm
Course Contents

Course Contents

  • Introduction
    1.1 Introducing the teacher and course 1.2 Motivating part about data analysis and ML 1) Introduction to the course (what we will study, how to evaluate) 2) Summary of course items 3) Introductory information on data analysis, examples of use from the industry. 4) Demonstration of data analysis on a "not obvious" statistical example (you can take the example of "weight loss" and statistical significance 5) A few examples where the lack of competent analysis led to adverse consequences (you can tell the story of Bill Gates financing small schools and show the effect of regression to medium on the example of the “heads-tails” experiment)
  • Statistics. Distribution.
    2.1 Probability and distribution 2.2 Distribution parameters (mode, median, mean, excess, asymmetry, range, variance, standard deviation) Introductory information on the basics of statistics (distributions, histogram parameters (difference in median, mode and arithmetic mean), significance level, variance, confidence intervals. Examples of analysis of statistics on applied industry problems (distribution of viewers' income).
  • Statistics. Confidence intervals and hypothesis testing.
    3.1 The concept of confidence interval 3.2 Calculation of the confidence interval and examples 3.3 Testing hypotheses using a confidence interval 3.4 Calculation of 3 sigma with examples
  • Hypothesis testing
    4.1 Significance criteria 4.2 Significance Level 4.3 Student, Criterion, Chi Square, Man Whitney 4.4 Testing hypotheses using criteria 4.5 Calculation of significance level. Verification of the hypothesis for p significance level.
  • Correlation and other data processing methods
    5.1 Correlation, autocorrelation 5.2 Spectral region 5.3 Time Series Filters 5.4 Fractals, wavelets, convolution
  • Machine learning basics
    6.1 What is ML and where is it used 6.2 The tasks of classification, regression, clustering, ranking and forecasting time series 6.3 Basic ML algorithms 6.4 Five Historical Paradigms of ML Development Introductory information about the basics of machine learning (what it is, the task of classification, regression, clustering), the main problems and metrics. Examples of the use of machine learning on applied industry problems (recommendation systems). 7.1 Data Types 7.2 Data preprocessing 7.3 Retraining and regularization 7.4 Quality metrics (completeness, accuracy, f1 measure, roc-auc, confusion matrix) 7.5 Which algorithms are better suited for which tasks
  • Data Basics
    8.1 What data can be trusted? 8.2 Basic methods of data manipulation. 8.3 Cognitive bias in data interpretation. We consider the basic techniques of data manipulation (see the books "Statistics and Seals" and "How to Lie Using Statistics") Interactive game: find the manipulation (examples from the books above) Interactive game: “deceive a friend” (task, conduct an experiment in a public opinion poll (you can choose any topic, for example, how many hours a day people watch TV) so as to mislead others) 4) Verification of the additional task (find a data set from the industry and either calculate the reliability of the hypothesis or build a predictive algorithm). Earning extra points for a task 5) Summing up, rating
Assessment Elements

Assessment Elements

  • non-blocking Компьютерное тестирование
  • non-blocking Домашнее задание 1
  • non-blocking Домашнее задание 2
  • non-blocking Домашнее задание 3
  • non-blocking Домашнее задание 4
  • non-blocking Домашнее задание 5
  • non-blocking Домашнее задание 6
  • non-blocking Домашнее задание 7
  • non-blocking Домашнее задание 8
  • non-blocking Домашнее задание 9
  • non-blocking Домашнее задание 10
  • non-blocking Домашнее задание 11
  • non-blocking Контрольная работа
  • non-blocking Индивидуальная исследовательская работа
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.06 * Домашнее задание 1 + 0.05 * Домашнее задание 10 + 0.05 * Домашнее задание 11 + 0.06 * Домашнее задание 2 + 0.06 * Домашнее задание 3 + 0.06 * Домашнее задание 4 + 0.06 * Домашнее задание 5 + 0.06 * Домашнее задание 6 + 0.06 * Домашнее задание 7 + 0.06 * Домашнее задание 8 + 0.06 * Домашнее задание 9 + 0.2 * Индивидуальная исследовательская работа + 0.06 * Компьютерное тестирование + 0.1 * Контрольная работа
Bibliography

Bibliography

Recommended Core Bibliography

  • Gordon S. Linoff and Michael J.A. Berry. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Third Edition. John Wiley & Sons, 2011 (888 pages). ISBN: 9780470650936: Текст электронный // ЭБС books24x7 — https://library.books24x7.com/toc.aspx?bookid=40629
  • Miroslav Kubat. An Introduction to Machine Learning. Springer, 2015 (296 pages) ISBN: 9783319200095: — Текст электронны // ЭБС books24x7 — https://library.books24x7.com/toc.aspx?bookid=117295
  • Дарелл Хафф. Как лгать при помощи статистики = How to Lie with Statistics. — М.: Альпина Паблишер, 2015. — 163 с. — ISBN 978-5-9614-5212-9. http://lib.alpinadigital.ru/ru/library/book/5573

Recommended Additional Bibliography

  • Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: Data Mining, Inference, and Prediction. – Springer, 2009. – 745 pp.
  • Mohammed, Mohssen Khan, Muhammad Badruddin Bashier, Eihab Bashier Mohammed. Machine Learning: Algorithms and Applications. Auerbach Publications © 2017 // https://library.books24x7.com/toc.aspx?bookid=117434