Machine Learning and Data Mining
- To familiarize students with a new rapidly evolving filed of machine learning and mining, and provide practical knowledge experience in analysis of real world data.
- Students know the statement of No-Free-Lunch theorems and explain the role of prior knowledge for solving machine learning problems.
- Students derive the bias-variance decomposition for MSE and “0-1” losses, and show how regularization affects the tradeoff.
- Students explain the concepts of bootstrapping, bagging and boosting, and justify the choice of a particular weak learner for a given aggregating algorithm.
- Students explain the relation between linear models and deep neural networks, describe how neural networks are trained, and understand what the role of data scientist is in designing a deep learning solution to a machine learning problem.
- Students understand the principles of Generative Adversarial Networks, know which metrics they can optimize and how to regularize them.
- Students explain and utilize the black-box optimization techniques.
- Students use the techniques for working with imbalanced datasets.
- Students explain the main approaches to graphical probabilistic models and training of them.
- Students understand the principles behind Variational AutoEncoders and implement them.
- Students know meta-learning approaches.
- Introduction to Machine Learning and Data Mining, No-Free-Lunch theoremsIntroduction to No-Free-Lunch theorems, discussion about the role of prior knowledge in Machine Learning. Discussion of the general Machine Learning workflow. Assumptions behind the most popular Machine Learning methods.
- Bias-variance decomposition, regularization techniquesModel complexity through bias-variance decomposition, methods to control complexity of the models, the most common regularization techniques.
- Introduction to meta-algorithms, bootstrap, boostingMeta-algorithms as a tool for regulating bias/variance of the model. Introduction to bootstrap: Random Forest. Stacking. Introduction to boosting: AdaBoost, Gradient Boosting Machine, XGBoost.
- Introduction and overview of deep learning methodsIntroduction to Deep Learning through No-Free-Lunch theorem lens. Popularity of Deep Learning methods. Correspondence between the most common Deep Learning methods and prior assumptions.
- Deep generative models: Generative Adversarial Networks (GANs)Jensen-Shannon divergence and Wasserstein distance as minimization problems. Adversarial Neural Networks: classical GAN, WGAN, energy-based GAN. Difficulties in adversarial training, gradient penalty for WGAN. Practical applications beyond generative problems. Adversarial AutoEncoder, BiGAN, CycleGAN, Adversarial Variational Bayes.
- Optimization techniques: black-box methods, first order methodsBrief overview of first order optimization methods: stochastic gradient descent, momentum, adam/adamax. Detailed discussion of black-box optimization methods: Bayesian optimization, Variational Optimization. Examples of black-box optimization: hyper-parameter tuning.
- Miscellaneous topics: imbalanced datasets, importance sampling, one-class classification methodsDiscussion of the problems caused by imbalanced datasets, particularly, for gradients based methods: change of priors, importance sampling. One-class classification: one-class SVM, density based methods, popular heuristics: dimensionality reduction (e.g. through AutoEncoders), Radial Basis Networks.
- Deep generative models: energy-based models, Boltzmann machines and deep belief networksDefinition of a generative problem, types of generative problems. Energy-based models and contrastive divergence: Boltzmann machines, Deep Belief Networks, Restricted Boltzmann Machines and their connection to the AutoEncoders.
- Deep generative models: Variational AutoEncodersVariational bounds on likelihood, Variational AutoEncoder, Conditional Variational AutoEncoder.
- Meta-learning: concept learning, learning how to learnConcept learning: Neural Statistician, Generative Matching Networks. Learning how to learn: optimization procedure as a learning problem, gradient-based optimization algorithms.
- Interim assessment (2 module)Final score for the homework: <br /><i>homework score</i> = min [1, ∑<sub>i</sub>x<sub>i</sub>] - penalty, where x<sub>i</sub> is a score for each homework. <br /><br />(Final grade) = 50% × (<i>homework score</i>) + 50% × (<i>exam score</i>).<ul><li>since each homework has a max score of 1 and there are 3 assignments, it will be scaled by 5/3 in this formula;</li><li>max exam score is 10, so it will be scaled by 1/2.</li></ul><br /><i>Final grade</i> = [5/3 ⋅ <i>homework score</i> + 1/2 ⋅ <i>exam score</i>]
- Hall, M., Witten, Ian H., Frank, E. Data Mining: practical machine learning tools and techniques. – 2011. – 664 pp.
- Han, J., Kamber, M., Pei, J. Data Mining: Concepts and Techniques, Third Edition. – Morgan Kaufmann Publishers, 2011. – 740 pp.
- Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: Data Mining, Inference, and Prediction. – Springer, 2009. – 745 pp.
- Mirkin, B. Core concepts in data analysis: summarization, correlation and visualization. – Springer Science & Business Media, 2011. – 388 pp.