• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Магистерская программа «Науки о данных (Data Science)»

21
Апрель

Introduction to Machine Learning and Data Mining

2020/2021
Учебный год
ENG
Обучение ведется на английском языке
5
Кредиты
Статус:
Курс по выбору
Когда читается:
1-й курс, 3, 4 модуль

Преподаватель

Course Syllabus

Abstract

The course “Machine Learning and Data Mining” introduces students to new and actively evolving interdisciplinary field of modern data analysis. Started as a branch of Artificial Intelligence, it attracted attention of physicists, computer scientists, economists, computational biologists, linguists and others and become a truly interdisciplinary field of study. In spite of the variety of data sources that could be analyzed, objects and attributes that from a particular dataset poses common statistical and structural properties. The interplay between known data and unknown ones give rise to complex pattern structures and machine learning methods that are the focus of the study. In the course we will consider methods of Machine Learning and Data Mining. Special attention will be given to the hands-on practical analysis of the real world datasets using available software tools and modern programming languages and libraries.
Learning Objectives

Learning Objectives

  • To familiarize students with a new rapidly evolving filed of machine learning and mining, and provide practical knowledge experience in analysis of real world data.
Expected Learning Outcomes

Expected Learning Outcomes

  • Students know basic notions and terminology used in MLDM.
  • Students understand fundamental principles of modern data analysis.
  • Students develop mathematical models of MLDM.
  • Students analyze real world data.
Course Contents

Course Contents

  • Introduction to Machine Learning and Data Mining
    Introduction to modern data analysis. Machine Learning. Data Mining and Knowledge Discovery in Data Bases. Course structure. Basic tasks and examples.
  • Clustering and its basic techniques
    The task of clusterization. K-means and its modifications (k-medoids and fuzzy cmeans clustering). Density-based methods: DB-scan and Mean Shift. Hierarchical clustering. Criteria of quality.
  • Classification and its basic techniques
    The task of classification. 1-Rules. K-Nearest Neighbours approach. Naïve Bayes. Decision Trees. Logistic Regression. Quality assessment: precision, recall, F - measure, loss-function, confusion-matrix, cross- validation and learning curves (ROC, lift etc.). Multi-class and multi-label classification.
  • Frequent Itemset Mining and Association Rules
    Frequent itemsets. Apriori and FP-growth algorithms. Association rules. Interestingness measures: support and confidence. Closed itemsets. Connection with Lattice Theory and Formal Concept Analysis. Applications.
  • Feature Selection and Dimensionality Reduction. Outlier detection
    Feature selection versus feature extraction and generation. Singular Value Decomposition, Latent Semantic Analysis and Principal Component Analysis. Boolean Matrix Factorization. Outlier and novelty detection techniques.
  • Recommender Systems and Algorithms
    Collaborative filtering. User-based and item-based methods. Slope one. Association rules based and bicluster-based techniques. Quality assessment: MAE, precision and recall. SVD-based approaches: pureSVD, SVD++ and time-SVD. Factorization machines.
  • Ensemble Clustering and Classification
    Ensemble methods of clusterization for k-means partitions’ aggregation. Ensemble methods of classification: Bagging, Boosting, and Random Forest.
  • Multimodal relational clustering
    Biclustering. Spectral co-clustering. Triclustering. Two-mode networks. Folksonomies and resource-sharing systems. Multimodal approaches. Applications: Community detection in Socail Network Analysis and gene expression analysis.
  • Artificial Neural Methods and Stochastic Optimization. Elements of Statistical Learning
    Artificial Neural Networks. Basic ideas of Deep Learning. (Stochastic) gradient descent. Statistical (Bayesian) view on Machine learning.
  • Machine Learning Tools and Big Data
    Orange, Weka, and Scikit-learn. Machine Learning for Big Data: Apache Spark.
Assessment Elements

Assessment Elements

  • non-blocking Homework
  • non-blocking Research project
  • non-blocking Exam
    The final exam consists of oral project defense, a student can be asked to answer some theoretical or practical questions. Оценка выставляется по формуле, учитывающей накопленную оценку. Экзамен проводится дистационно в устной форме (защита проекта) на платформе Zoom. К экзамену необходимо подключиться согласно расписанию защит, высланному преподавателем на групповую почту студентов накануне экзамена. Могут быть заданы дополнительные теоретические вопросы или выданы небольшие практические задания, не требующие использования программирования.
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.2 * Exam + 0.4 * Homework + 0.4 * Research project
Bibliography

Bibliography

Recommended Core Bibliography

  • Han, J., Kamber, M., Pei, J. Data Mining: Concepts and Techniques, Third Edition. – Morgan Kaufmann Publishers, 2011. – 740 pp.

Recommended Additional Bibliography

  • Hall, M., Witten, Ian H., Frank, E. Data Mining: practical machine learning tools and techniques. – 2011. – 664 pp.