Master
2020/2021



How to Win a Data Science Competition: Learn from Top Kagglers
Type:
Compulsory course (Data Mining)
Area of studies:
Applied Mathematics and Informatics
Delivered by:
Department of Applied Mathematics and Informatics
When:
2 year, 3 module
Mode of studies:
distance learning
Instructors:
Olga Razvenskaya
Master’s programme:
Интеллектуальный анализ данных
Language:
English
ECTS credits:
3
Contact hours:
2
Course Syllabus
Abstract
The study of this discipline is based on the following courses: • Machine learning • Data analysis methods To master the discipline, students must possess the following knowledge and competencies: • Programming method • Linear algebra Probability and statistics The main provisions of the discipline can be used in their professional activities. https://www.coursera.org/learn/competitive-data-science
Learning Objectives
- The purpose of the discipline is to get acquainted with modern methods of data analysis and ma-chine learning and their use in data analysis competitions
Expected Learning Outcomes
- Be able to choose the method of data processing and perform the data processing by the selected method
- Be able to choose the method of cross validation and evaluate the quality of the selected method of data processing
- Be able to solve the problems of data analysis competitions.
Course Contents
- Data processingThe tasks of predictive modeling. Preprocessing of data. Collection and processing of data from various sources, texts and images. Advanced feature engineering techniques, generating mean-encodings, using aggregated statistical measures, finding nearest neighbors as a means to im-prove the predictions
- Methodology of cross validationReliable cross validation methodologies. Overfitting and underfitting. Advanced techniques to overcome overfitting and underfitting. Analysis and interpretation of data. Comparisson of data analysis algorithms
- Completions in data analysisExamples of competitions. General principals to approach data analysis competition problem. Key issues for a good solution
Bibliography
Recommended Core Bibliography
- Muller, A. C., & Guido, S. (2017). Introduction to machine learning with Python: a guide for data scientists. O’Reilly Media. (HSE access: http://ebookcentral.proquest.com/lib/hselibrary-ebooks/detail.action?docID=4698164)
Recommended Additional Bibliography
- Witten, I. H. et al. Data Mining: Practical machine learning tools and techniques. – Morgan Kaufmann, 2017. – 654 pp.