Applied Machine Learning

Master 2021/2022

Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'

Type: Compulsory course (Information Analytics in Enterprise Management )

Area of studies: Business Informatics

Delivered by: Department of Business Informatics

Where: Faculty of Economics

When: 1 year, 3 module

Mode of studies: distance learning

Open to: students of all HSE University campuses

Instructors: Sergey Lisitsyn

Master’s programme: Information Analytics in Enterprise Management

Language: English

ECTS credits: 5

Contact hours: 28

Full Syllabus

Abstract

Machine learning is the field of study that helps us to find the dependencies in data automatically. Such a technology enables to solve different problems without explicit programming of rules. Due to advances in computing and the field itself, during last decade machine learning has become an essential feature of products ranging from web-services to banks. In this course the student is going to overview the essential concepts of machine learning and then practice employing machine learning methods to solve business tasks. This course emphasizes the practical part and considers various aspects of solving real-world problems. The course content covers all the popular methods such as linear methods, gradient boosting, and neural networks. Finally, the course considers the best practices of major companies leveraging the machine learning technology.

Learning Objectives

Learn to identify a machine learning problem to solve a business problem
Practice fitting models to solve essential machine learning problems such as regression and classification
Learn to design and to develop machine learning systems
Learn to re-use pre-trained models to lower the development cost of a machine learning systems

Expected Learning Outcomes

Can identify a problem suitable for machine learning
Able to apply gradient boosting approach to solve classification and regression problems
Able to fit a logistic regression model on a given dataset
Able to fit and interpret a decision tree model on a given dataset
Able to identify a clustering problem
Able to identify classification, regression, and clustering problems
Able to identify overfitting
Able to identify the suitable metric for a machine learning system
Able to train a neural network given a dataset
Able to use pre-trained models
Can fit a clustering model given a dataset
Can identify a recommender problem
Knows at least a few modern applications of machine learning
Knows the essential rules to develop and support machine learning systems
Knows the limitations of linear models
Knows the relations between complexity and overfitting
Understands the boosting approach to create an ensemble of models
Understands the concept of differentiable programming
Understands the concept of embeddings
Understands the concept of non-parametric learning
Understands the essential methods for recommenders: collaborative filtering, content-based, and matrix factorization
Understands the idea of convolution as the base operation for images and audio data
Understands the universality of gradient boosting approach

Course Contents

Scope of machine learning
Machine learning problems
Linear models for regression and classification
Decision trees and ensembles
Overfitting
Boosting and gradient boosting
Recommender systems and embeddings
Non-parametric methods for classification and regression
Clustering
Metrics of machine learning
Neural networks
Convolutional neural networks
Machine learning in production systems

Assessment Elements

Homework №1
A student should provide a Jupyter notebook.
Homework №2
A student should either provide a Jupyter notebook to the professor, or participate in an in-class Kaggle competition.
Test
There are no time limitations to submit the test responses. Tests are provided online with no proctoring.
Written exam
Format: the exam is taken in written form, online (as a programming assignment). The MS Teams platform is used to communicate with students. Students are not allowed to involve any other person in their programming assignment. Any interaction with other students that gives advantage on the assignment is prohibited and so is any plagiarism in the programming assignment. Students are allowed to use any Internet resources and clarify their assignment with the professor.

Interim Assessment

2021/2022 3rd module
0.25 * Homework №2 + 0.4 * Written exam + 0.1 * Test + 0.25 * Homework №1

Bibliography

Recommended Core Bibliography

D. Sculley, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, & Michael Young. (n.d.). Machine Learning: The High-Interest Credit Card of Technical Debt. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.BAEF1F2C
Deep learning, Goodfellow, I., 2016
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction (Vol. Second edition, corrected 7th printing). New York: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=277008
Machine learning : a probabilistic perspective, Murphy, K. P., 2012
Machine learning in action, Harrington, P., 2012
Machine learning, Mitchell, T. M., 1997
Pattern recognition and machine learning, Bishop, C. M., 2006
Segaran, T. (2007). Programming Collective Intelligence : Building Smart Web 2.0 Applications. Beijing: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=415280

Recommended Additional Bibliography

Caselles-Dupré, H., Lesaint, F., & Royo-Letelier, J. (2018). Word2Vec applied to Recommendation: Hyperparameters Matter. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1804.04212

Course Syllabus