- Predictive Modeling gives insight into machine learning algorithms with emphasis on assessing accuracy of prediction and selecting among the models. Another indirect purpose of the course is to guide the students' research by suggesting more challenging topics and problems to the interested students. This kind of activity develops self-study skills and critical thinking, highlights the importance of literature review and many more.
- be aware of understand theory behind predictive modeling, types of predictive models, key steps of model creation and evaluation
- be aware of practical applications of predictive modeling from science to business
- know how to implement different types of models in the R/Python programming language
- acquire the skills to use R/Python functions from different R/Python packages to pre-process the input
- apply the knowledge and tools of predictive analytics to real-life applications
- IntroductionKey parts of predictive models. Concepts of model building. Examples of real life applications and projects, i.e. assessment of the consumer basket of large retailers, analysis of customer outflow from the retail network, tornado prediction, etc.
- Predictive modeling processBasics of statistics. Summary Statistics. Correlations between variables. Correlation analysis. Missing data. Data splitting and preprocessing.
- Reducing the dimensionFactor analysis. Principal component analysis. Criteria for determining the number of factors. Rotation methods.
- Regression modelsMeasuring performance in regression models. Linear regression. Partial least squares and penalized models. Logistic regression. Probit and logit models.
- Time series analysisReview of univariate analysis of stationary time series. AR(p) time series process. MA(q) time series process. ARMA(p, q) time series process. Multivariate analysis of stationary time series characteristics. Vector autoregressive model. Diagnostic tests, causality analysis. Forecasting.
- Classification modelsMeasuring performance in classification models. Sensitivity and specificity. Overfitting. Receiver operating characteristic curves. K-Nearest Neighbors. Linear and logistic regressions. Nearest shrunken centroids. Neural networks. Support vector machines. Naive Bayes. Predictors. Candidate models. Optimal model. Performance estimation. Concept of overfitting. Model tuning and model evaluation. Tuning parameters. Resampling. k-fold cross-validation.
- ClusteringHierarchical clustering. k-means clustering. Distribution based clustering methods. Minimal spanning tree. Pattern Analysis. Evaluation and assessment.
- Markov Chain Monte Carlo methodsGoals of Markov Chain Monte Carlo (MCMC). Markov processes. Properties of Markov chains. The stationary state of the chain. Monte Carlo simulations of distributions. Inverse CDF method. Issues in chain efficacy. MCMC implementation in R/Python and examples. Applications of MCMC: modeling returns of S&P500 index.
- Dynamic linear modelsBayesian framework. State space models. Examples of nonlinear and non-Gaussian state space models. State estimation and forecasting: the Kalman filter for dynamic linear models. Smoothing. Controllability and observability of time-invariant DLMs. Filter stability.
- Home assignmentsShould be done by students individually. Each student has to prepare a report and submit to instructors before the deadline. Some home assignments can be done in groups.
- Class activity
- Final examinationThe exam is a written test in StartExam platform with asynchronous proctoring by Examus. The rules of the exam are available at https://elearning.hse.ru/en/student_steps/ The exam consists of several questions. In some of them students should provide a short answer, in others they have to do a matching or answer the multiple choice questions. The exam will be open-book, so students may use slides, Python, Google, home assignments, etc. on computer only (where Examus will perform recording of exam). Students are not allowed to use a mobile phone or any other devices and communicate with classmates and any other people during the exam.
- Interim assessment (2 module)0.3 * Class activity + 0.3 * Final examination + 0.4 * Home assignments
- Lantz, B. (2019). Machine Learning with R : Expert Techniques for Predictive Modeling, 3rd Edition (Vol. Third edition). Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2106304
- V Kishore Ayyadevara. (2018). Pro Machine Learning Algorithms : A Hands-On Approach to Implementing Algorithms in Python and R. Apress.
- Deepti Gupta. (2018). Applied Analytics Through Case Studies Using SAS and R : Implementing Predictive Models and Machine Learning Techniques. Apress.
- Miroslav Kubat. (2017). An Introduction to Machine Learning (Vol. 2nd ed. 2017). Springer.