• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Data Analysis

2019/2020
Academic Year
ENG
Instruction in English
5
ECTS credits
Course type:
Compulsory course
When:
3 year, 3, 4 module

Instructors


Бритков Радомир Александрович


Matyushin, Leonid


Хатбуллина Лейля Равилевна


Shestakoff, Andrey

Course Syllabus

Abstract

This course presents the foundations of rapidly developing scientific field called intellectual data analysis or machine learning. This field is about algorithms that automatically adjust to data and extract valuable structure and dependencies from it. The automatic adjustment to data by machine learning algorithms makes it especially convenient tool for analysis of big volumes of data, having complicated and diverse structure which is a common case in modern "information era". During this course most common problems of machine learning are considered, including classification, regression, dimensionality reduction, clustering, collaborative filtering and ranking. The most famous and widely used algorithms suited to solve these problems are presented. For each algorithm its data assumptions, advantages and disadvantages as well as connections with other algorithms are analyzed to provide an in-depth and critical understanding of the subject. Much attention is given to developing practical skills during the course. Students are asked to apply studied algorithms to real data, critically analyze their output and solve theoretical problems highlighting important concepts of the course. Machine learning algorithms are applied using python programming language and its scientific extensions, which are also taught during the course. The course is designed for students of the bachelor program "Software Engineering" at the Faculty of Computer Science, HSE.
Learning Objectives

Learning Objectives

  • make students familiar with the major problems of data analysis, solved with machine learning (classification, regression, dimensionality reduction, clustering, collaborative filtering and ranking)
  • make students acquainted with the major algorithms to solve stated problems
  • give students a critical understanding of the subject, highlighting the limitations of each algorithm, data assumptions each algorithm relies upon, its strengths and weaknesses
  • teach students one of the most commonly used tools for machine learning: python programming language together with its major data analysis libraries - numpy, scipy, pandas, matplotlib and machine learning library scikit-learn
  • give students practical experience from application of studied methods to real datasets
Expected Learning Outcomes

Expected Learning Outcomes

  • to know major problems of data analysis, solved with machine learning
  • to understand scientific articles about data analysis and machine learning
  • to know major algorithms to solve stated problems
  • to understand dependencies between algorithms, their advantages and disadvantages
  • to know python programming language together with its major data analysis libraries - numpy, scipy, pandas, matplotlib and machine learning library scikit-learn
  • to understand, which kinds of algorithms are more appropriate for what kinds of data
  • to know the whole pipeline of research & development of machine learning methods
  • to know, how to transform data to make it more suitable for machine learning algorithms
Course Contents

Course Contents

  • Introduction to data science and machine learning
  • K nearest neighbours method
  • Decision trees
  • Model evaluation
  • Linear classifier methods
  • Support vector machines
  • Regression
  • Boosting
  • Other ensemble methods: bagging, RandomForest, etc.
  • Feed Forward Neural networks
  • Convolutional Neural networks
  • Feature selection and dimentionality reduction
  • Introduction to NLP
  • Clustering
  • Introduction to recommendation systems
Assessment Elements

Assessment Elements

  • non-blocking Домашнее задание
  • non-blocking Коллоквиум
  • non-blocking Соревнования
    Participation in machine learning competition is aimed to give students an opportunity to get extra points and to get practical experience of application of studied methods to real data analysis task. The task is to make a prediction system and [competition score] is set according to the accuracy of the developed prediction system.
  • non-blocking Экзамен
    Экзамен устный в Zoom. Без прокторинга. На экзамене можно пользоваться всеми своими материалами. Возможно выставление оценки автоматом. Технические требования: web-камера, микрофон, наушники / колонки, Zoom.
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    [score]=0.7*[cumulative score] + 0.3*[exam score] where [cumulative score] = 0.8*[homework score] + 0.2*[colloquium score] + 0.2*[competition score]
Bibliography

Bibliography

Recommended Core Bibliography

  • Christopher M. Bishop. (n.d.). Australian National University Pattern Recognition and Machine Learning. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.EBA0C705
  • Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: Data Mining, Inference, and Prediction. – Springer, 2009. – 745 pp.
  • Mohri, M., Talwalkar, A., & Rostamizadeh, A. (2012). Foundations of Machine Learning. Cambridge, MA: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=478737

Recommended Additional Bibliography

  • Murphy, K. P. (2012). Machine Learning : A Probabilistic Perspective. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=480968