Introduction to Machine Learning and Data Mining
- To familiarize students with a new rapidly evolving filed of machine learning and mining, and provide practical knowledge experience in analysis of real world data.
- Students know basic notions and terminology used in MLDM.
- Students understand fundamental principles of modern data analysis.
- Students develop mathematical models of MLDM.
- Students analyze real world data.
- Introduction to Machine Learning and Data MiningIntroduction to modern data analysis. Machine Learning. Data Mining and Knowledge Discovery in Data Bases. Course structure. Basic tasks and examples.
- Clustering and its basic techniquesThe task of clusterization. K-means and its modifications (k-medoids and fuzzy cmeans clustering). Density-based methods: DB-scan and Mean Shift. Hierarchical clustering. Criteria of quality.
- Classification and its basic techniquesThe task of classification. 1-Rules. K-Nearest Neighbours approach. Naïve Bayes. Decision Trees. Logistic Regression. Quality assessment: precision, recall, F - measure, loss-function, confusion-matrix, cross- validation and learning curves (ROC, lift etc.). Multi-class and multi-label classification.
- Frequent Itemset Mining and Association RulesFrequent itemsets. Apriori and FP-growth algorithms. Association rules. Interestingness measures: support and confidence. Closed itemsets. Connection with Lattice Theory and Formal Concept Analysis. Applications.
- Feature Selection and Dimensionality Reduction. Outlier detectionFeature selection versus feature extraction and generation. Singular Value Decomposition, Latent Semantic Analysis and Principal Component Analysis. Boolean Matrix Factorization. Outlier and novelty detection techniques.
- Recommender Systems and AlgorithmsCollaborative filtering. User-based and item-based methods. Slope one. Association rules based and bicluster-based techniques. Quality assessment: MAE, precision and recall. SVD-based approaches: pureSVD, SVD++ and time-SVD. Factorization machines.
- Ensemble Clustering and ClassificationEnsemble methods of clusterization for k-means partitions’ aggregation. Ensemble methods of classification: Bagging, Boosting, and Random Forest.
- Multimodal relational clusteringBiclustering. Spectral co-clustering. Triclustering. Two-mode networks. Folksonomies and resource-sharing systems. Multimodal approaches. Applications: Community detection in Socail Network Analysis and gene expression analysis.
- Artificial Neural Methods and Stochastic Optimization. Elements of Statistical LearningArtificial Neural Networks. Basic ideas of Deep Learning. (Stochastic) gradient descent. Statistical (Bayesian) view on Machine learning.
- Machine Learning Tools and Big DataOrange, Weka, and Scikit-learn. Machine Learning for Big Data: Apache Spark.
- Research project
- ExamThe final exam consists of oral project defense, a student can be asked to answer some theoretical or practical questions.
- Han, J., Kamber, M., Pei, J. Data Mining: Concepts and Techniques, Third Edition. – Morgan Kaufmann Publishers, 2011. – 740 pp.
- Hall, M., Witten, Ian H., Frank, E. Data Mining: practical machine learning tools and techniques. – 2011. – 664 pp.