Discriminative Methods in Machine Learning

Postgraduate course 2020/2021

Type: Elective course

Area of studies: Informatics and Computer Engineering

Delivered by: School of Data Analysis and Artificial Intelligence

Where: Faculty of Computer Science

When: 2 year, 1 semester

Mode of studies: offline

Instructors: Attila Kertesz-Farkas

Language: English

ECTS credits: 5

Contact hours: 36

Full Syllabus Ask Question

Abstract

This course gives an introduction to the most popular discriminative and differentiable machine learning methods, which are used in supervised learning. After completing the study of the discipline, the PhD student should have knowledge about modern discriminative methods such as deep convolutional learning techniques, kernel machines, limitations of learning methods and standard definitions such as overfitting, regularization, etc., knowledge about ongoing developments in Machine Learning, hands-on experience with large scale machine learning problems, knowledge about how to design and develop machine learning programs using programming language Python, and be able to think critically with real data.

Learning Objectives

The learning objective of the course “Discriminative methods in Machine Learning” is to pro-vide students advanced techniques and deeper theoretical and practical knowledge in modern discrimi-native learning techniques, such as: <ul> <li>Logistic regression, Support Vector Machines, regularization, Neural Networks, Deep Neural Networks, Limits on learning, Deep Learning techniques, Neural Turing Machines, Performance eval-uation techniques, optimization algorithms.</li> </ul>

Expected Learning Outcomes

Students know basics of classification and decision making, performance evaluation, machine bias.
Students know standard discriminative methods such as linear and logistic regression, neural networks, collaborative filtering, word embeddings, decision trees, and similiraty based inference.
Students are introduced to similarity metrics and their concepts.
Students know techniques related to deep neural network such as concolutional layers, rectified linear units, discussion on problems association to deep neural networks such as vanishing gradients.
Students know discriminative methods for sequential data such as text.
Students know differentiable systems, which learn from to operate memory access.
Students know advanced methods for training and regularizing neural networks.
Students are introduced to the theory of Machine Learning.

Course Contents

Introduction to machine learning, Evaluation techniques
Basic definitions or machine learning, principles and types of machine learning, performance metrics, errors and type of errors. ROC characteristics. Machine bias.
Basic methods
Regression, Logistic regression, Support Vector Machines, Neural Networks, Collaborative Filtering, K-nearest Neighbor, decision trees, random forests.
Kernels and distance functions
Kernel functions for real-valued vectors and for discrete models. Distance functions, edit distance, and information distance. Curse of dimensionality.
Deep Neural Networks
Auto Encoders, deep neural networks, stacked auto encoders, convolutional layers and max-pooling. Deep data vs. wide data, universal approximators, word embeddings.
Methods for sequential data
Sequential data, Recurrent Neural Networks, and long-short term memory models.
Neural Turing Machines
Neural Turing Machines and its applications.
Optimization and Regularization
Error Surfaces, Optimization and Regularization methods: stochastic gradient descent, momentum methods, polyak averaging, coordinate descent, adaptive learning rates, line-search, adaGrad, RMprop, Second order methods: Levenberg–Marquardt, Newton, conjugate gradients, Broyden–Fletcher–Goldfarb–Shanno. Regularization: parameter norm penalty, early stopping, data augmentation, sparse coding, mini batches vs sharp minima, batch normalization.
Algorithm independent machine learning and No-free-lunch theorems
Regularization, overfitting-underfitting, bias-variance decomposition in model selection, model capacity, minimum description length, parameters and hyper parameters, other problems: missing values and class imbalance problem. Bootstrap and Jackknife estimations. No-Free-lunch theorems. Interpretability. Bias.

Assessment Elements

Presence
Exam
Written exam. Preparation time – 180 min.
The final exam will consist of a selection of problems equally weighted. No material is allowed for the exam. Each question will focus on a particular topic presented during the lectures.
The questions consist in exercises on any topic seen during the lectures. To be prepared for the final exam, PhD students must be able to answer questions from the topics covered during the lecture.
Presence
Exam
Written exam. Preparation time – 180 min.
The final exam will consist of a selection of problems equally weighted. No material is allowed for the exam. Each question will focus on a particular topic presented during the lectures.
The questions consist in exercises on any topic seen during the lectures. To be prepared for the final exam, PhD students must be able to answer questions from the topics covered during the lecture.

Interim Assessment

Interim assessment (1 semester)
0.7 * Exam + 0.3 * Presence

Bibliography

Recommended Core Bibliography

James, G. et al. An introduction to statistical learning. – Springer, 2013. – 426 pp.

Recommended Additional Bibliography

Wainwright, M. J., & Jordan, M. I. (2008). Graphical Models, Exponential Families, and Variational Inference. Boston: Now Publishers. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=352768

Course Syllabus