• A
• A
• A
• ABC
• ABC
• ABC
• А
• А
• А
• А
• А
Regular version of the site
Master 2020/2021

## Predictive Modelling

Category 'Best Course for Career Development'
Type: Elective course (Big Data Systems)
When: 2 year, 1, 2 module
Mode of studies: offline
Master’s programme: Big Data Systems
Language: English
ECTS credits: 6

### Course Syllabus

#### Abstract

Predictive Modeling is a statistical subject taught to the second year graduate students over the first and second academic modules. The material ranges from classical topics such as linear and non-linear regression and classification to less frequently discussed questions such as Markov Chain Monte-Carlo, dynamic linear models, multivariate time series analysis, etc. For each model considered, much attention is paid to performance assessment so as to minimize the forecast error. Throughout the course a certain balance between mathematical rigor and intuition has to be maintained. Often, this dilemma is resolved in favor of illustrative examples which help students capture the main idea and learn how to use it in practice instead of memorizing derivations. Nonetheless, we find it instructive to provide brief and tractable proofs whenever it makes pedagogical or some other sense. Some not too hard theoretical questions are left for home assignments which makes students work with pen and paper and provides a deeper understanding of underlying theory. The practice skills are developed throughout in-class practice sessions and home assignments involving real-life datasets.

#### Learning Objectives

• Predictive Modeling gives insight into machine learning algorithms with emphasis on assessing accuracy of prediction and selecting among the models. Another indirect purpose of the course is to guide the students' research by suggesting more challenging topics and problems to the interested students. This kind of activity develops self-study skills and critical thinking, highlights the importance of literature review and many more.

#### Expected Learning Outcomes

• be aware of understand theory behind predictive modeling, types of predictive models, key steps of model creation and evaluation
• be aware of practical applications of predictive modeling from science to business
• know how to implement different types of models in the R/Python programming language
• acquire the skills to use R/Python functions from different R/Python packages to pre-process the input
• apply the knowledge and tools of predictive analytics to real-life applications

#### Course Contents

• Introduction
Key parts of predictive models. Concepts of model building. Examples of real life applications and projects, i.e. assessment of the consumer basket of large retailers, analysis of customer outflow from the retail network, tornado prediction, etc.
• Predictive modeling process
Basics of statistics. Summary Statistics. Correlations between variables. Correlation analysis. Missing data. Data splitting and preprocessing.
• Reducing the dimension
Factor analysis. Principal component analysis. Criteria for determining the number of factors. Rotation methods.
• Regression models
Measuring performance in regression models. Linear regression. Partial least squares and penalized models. Logistic regression. Probit and logit models.
• Time series analysis
Review of univariate analysis of stationary time series. AR(p) time series process. MA(q) time series process. ARMA(p, q) time series process. Multivariate analysis of stationary time series characteristics. Vector autoregressive model. Diagnostic tests, causality analysis. Forecasting.
• Classification models
Measuring performance in classification models. Sensitivity and specificity. Overfitting. Receiver operating characteristic curves. K-Nearest Neighbors. Linear and logistic regressions. Nearest shrunken centroids. Neural networks. Support vector machines. Naive Bayes. Predictors. Candidate models. Optimal model. Performance estimation. Concept of overfitting. Model tuning and model evaluation. Tuning parameters. Resampling. k-fold cross-validation.
• Clustering
Hierarchical clustering. k-means clustering. Distribution based clustering methods. Minimal spanning tree. Pattern Analysis. Evaluation and assessment.
• Markov Chain Monte Carlo methods
Goals of Markov Chain Monte Carlo (MCMC). Markov processes. Properties of Markov chains. The stationary state of the chain. Monte Carlo simulations of distributions. Inverse CDF method. Issues in chain efficacy. MCMC implementation in R/Python and examples. Applications of MCMC: modeling returns of S&P500 index.
• Dynamic linear models
Bayesian framework. State space models. Examples of nonlinear and non-Gaussian state space models. State estimation and forecasting: the Kalman filter for dynamic linear models. Smoothing. Controllability and observability of time-invariant DLMs. Filter stability.

#### Assessment Elements

• Home assignments
Should be done by students individually. Each student has to prepare a report and submit to instructors before the deadline. Some home assignments can be done in groups.
• Class activity
• Final examination
The exam is a written test in StartExam platform with asynchronous proctoring by Examus. The rules of the exam are available at https://elearning.hse.ru/en/student_steps/ The exam consists of several questions. In some of them students should provide a short answer, in others they have to do a matching or answer the multiple choice questions. The exam will be open-book, so students may use slides, Python, Google, home assignments, etc. on computer only (where Examus will perform recording of exam). Students are not allowed to use a mobile phone or any other devices and communicate with classmates and any other people during the exam.

#### Interim Assessment

• Interim assessment (2 module)
0.3 * Class activity + 0.3 * Final examination + 0.4 * Home assignments

#### Recommended Core Bibliography

• Lantz, B. (2019). Machine Learning with R : Expert Techniques for Predictive Modeling, 3rd Edition (Vol. Third edition). Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2106304
• V Kishore Ayyadevara. (2018). Pro Machine Learning Algorithms : A Hands-On Approach to Implementing Algorithms in Python and R. Apress.