Машинное обучение в Питоне

2021/2022

Лучший по критерию «Полезность курса для Вашей будущей карьеры»

Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»

Лучший по критерию «Новизна полученных знаний»

Статус: Общеуниверситетский факультатив

Кто читает: Департамент больших данных и информационного поиска

Когда читается: 1, 2 модуль

Охват аудитории: для всех

Преподаватели: Макаров Михаил Сергеевич, Мельников Олег, Тихонова Мария Ивановна

Язык: английский

Кредиты: 4

Контактные часы: 56

Full Syllabus

Abstract

This course introduces the students to the elements of machine learning and deep learning, including supervised and unsupervised methods such as linear and logistic regressions, splines, decision trees, support vector machines, bootstrapping, random forests, boosting, regularized methods, and topics in neural networks. Students apply Python programming language and popular packages, such as pandas, scikit-learn and TensorFlow, to investigate/visualize datasets and develop machine learning models to solve data-driven regression, classification and unsupervised problems. ❕PREREQUISITE❕ include prior coding experience in a higher level programming language (preferably, Python), prior coursework in statistics, linear algebra, calculus, writing/reading fluency in English language, and basic familiarity with machine learning. ❗STUDY LOAD❗ is 10 hours per week for well prepared students, but could be more for students lacking prerequisites. ⚠️IMPORTANT⚠️: this is an active and heavy hands-on course. We will have weekly machine learning assignments, including in-class Kaggle competitions (as group activities), quizzes on material from course textbook, and occasional DataCamp courses covering the necessary prerequisite concepts. We will also have weekly seminars and lectures, 80 minutes each.

Learning Objectives

The course aims to help students develop an understanding of the process to learn from data, familiarize them with a wide variety of algorithmic and model based methods to extract information from data, teach to apply and evaluate suitable methods to various datasets by model selection and predictive performance evaluation.

Course Contents

Academic Integrity, Honor, Ethics
Review of Calculus, Linear Algebra, Probability, Stats, Python, Colab
Introduction to Statistical Learning
Linear Regression and K Nearest Neighbor (KNN)
Classification: Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, KNN
Resampling Methods. Cross Validation (CV), Bootstrap
Linear Model Selection and Regularization
Non-linear Regression
Decision Trees, Bagging, Random Forest, Boosting
Support Vector Machines (SVM)
Clustering and Dimension Reduction: k-Means, Hierarchical Clustering (HC), DBSCAN, PCA
Artificial Neural Networks (ANN) and Introduction to Deep Learning
Recurrent Neural Networks (RNN), Long-short Term Memory (LSTM)
Convolutional Neural Networks (CNN)
Deep Generative Models and Autoencoders

Assessment Elements

Quizzes
All questions and answers are in English. These closely follow the textbook, lectures, seminars and material posted in LMS, including questions about Syllabus and ethics/integrity/honor code.
homework assignments
Students will likely be formed in groups of about 2 students. Collaborations outside of their group will only be allowed at a high level. See grading rubric and syllabus for further instructions.
participation
See syllabus for more info.

Interim Assessment

2021/2022 2nd module
0.4 * homework assignments + 0.2 * participation + 0.4 * Quizzes

Bibliography

Recommended Core Bibliography

Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani. (2013). An Introduction to Statistical Learning : With Applications in R. Springer.

Recommended Additional Bibliography

Trevor Hastie, Robert Tibshirani , et al., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition, 2017. Free from the publisher: https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf

Course Syllabus