Методы машинного обучения и майнинга данных

Магистратура 2021/2022

Лучший по критерию «Полезность курса для Вашей будущей карьеры»

Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»

Статус: Курс по выбору (Науки о данных (Data Science))

Направление: 01.04.02. Прикладная математика и информатика

Кто читает: Департамент анализа данных и искусственного интеллекта

Где читается: Факультет компьютерных наук

Когда читается: 2-й курс, 1, 2 модуль

Формат изучения: без онлайн-курса

Охват аудитории: для всех кампусов НИУ ВШЭ

Преподаватели: Маевский Артём Сергеевич, Рогачев Александр Игоревич

Прогр. обучения: Науки о данных

Язык: английский

Кредиты: 8

Контактные часы: 54

Full Syllabus Ask Question

Abstract

The course "Machine Learning and Data Mining"; introduces students to new and actively evolving interdisciplinary field of modern data analysis. Started as a branch of Artificial Intelligence, it attracted attention of physicists, computer scientists, economists, computational biologists, linguists and others and become a truly interdisciplinary field of study. In spite of the variety of data sources that could be analyzed, objects and attributes that from a particular dataset poses common statistical and structural properties. The interplay between known data and unknown ones give rise to complex pattern structures and machine learning methods that are the focus of the study. In the course we will consider methods of Machine Learning and Data Mining. Special attention will be given to the hands-on practical analysis of the real world datasets using available software tools and modern programming languages and libraries.

Learning Objectives

To familiarize students with a new rapidly evolving filed of machine learning and mining, and provide practical knowledge experience in analysis of real world data.

Expected Learning Outcomes

Students derive the bias-variance decomposition for MSE and “0-1” losses, and show how regularization affects the tradeoff.
Students explain and utilize the black-box optimization techniques.
Students explain the concepts of bootstrapping, bagging and boosting, and justify the choice of a particular weak learner for a given aggregating algorithm.
Students explain the main approaches to graphical probabilistic models and training of them.
Students explain the relation between linear models and deep neural networks, describe how neural networks are trained, and understand what the role of data scientist is in designing a deep learning solution to a machine learning problem.
Students know meta-learning approaches.
Students know the statement of No-Free-Lunch theorems and explain the role of prior knowledge for solving machine learning problems.
Students understand the principles behind Variational AutoEncoders and implement them.
Students understand the principles of Generative Adversarial Networks, know which metrics they can optimize and how to regularize them.
Students use the techniques for working with imbalanced datasets.

Course Contents

Introduction to Machine Learning and Data Mining, No-Free-Lunch theorems
Bias-variance decomposition, regularization techniques
Introduction to meta-algorithms, bootstrap, boosting
Introduction and overview of deep learning methods
Deep generative models: Generative Adversarial Networks (GANs)
Optimization techniques: black-box methods, first order methods
Miscellaneous topics: imbalanced datasets, importance sampling, one-class classification methods
Deep generative models: energy-based models, Boltzmann machines and deep belief networks
Deep generative models: Variational AutoEncoders
Meta-learning: concept learning, learning how to learn

Assessment Elements

Homeworks
Exam
Homeworks
Exam

Interim Assessment

2021/2022 2nd module
Final score for the homework:
homework score = min [1, ∑_ix_i] - penalty, where x_i is a score for each homework.

(Final grade) = 50% × (homework score) + 50% × (exam score).
- since each homework has a max score of 1 and there are 3 assignments, it will be scaled by 5/3 in this formula;
- max exam score is 10, so it will be scaled by 1/2.
Final grade = [5/3 ⋅ homework score + 1/2 ⋅ exam score]

Bibliography

Recommended Core Bibliography

Hall, M., Witten, Ian H., Frank, E. Data Mining: practical machine learning tools and techniques. – 2011. – 664 pp.
Han, J., Kamber, M., Pei, J. Data Mining: Concepts and Techniques, Third Edition. – Morgan Kaufmann Publishers, 2011. – 740 pp.
Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: Data Mining, Inference, and Prediction. – Springer, 2009. – 745 pp.

Recommended Additional Bibliography

Mirkin, B. Core concepts in data analysis: summarization, correlation and visualization. – Springer Science & Business Media, 2011. – 388 pp.

Course Syllabus