Statistical Learning Theory

Master 2020/2021

Category 'Best Course for Career Development'

Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'

Category 'Best Course for New Knowledge and Skills'

Type: Elective course (Applied Statistics with Network Analysis)

Area of studies: Applied Mathematics and Informatics

Delivered by: International laboratory for Applied Network Research

Where: International laboratory for Applied Network Research

When: 1 year, 3, 4 module

Mode of studies: distance learning

Instructors: Valentina Kuskova, Nada Lavrač, Ljupco Todorovski

Master’s programme: Applied Statistics with Network Analysis

Language: English

ECTS credits: 8

Contact hours: 48

Full Syllabus

Abstract

Statistical learning is the field that sets the framework for machine learning drawn from statistics and functional analysis. The goal of statistical learning theory is to study, in a statistical frame-work, the properties of learning algorithms. This study serves a two-fold purpose. On one hand it provides strong guarantees for existing algorithms, and on the other hand suggests new algo-rithmic approaches that are potentially more powerful. In this course we will go in detail into the theory and methods of statistical learning, and in par-ticular complexity regularization (i.e., how do you choose the complexity of your model when you have to learn it from data). This issue is at the heart of the most successful and popular ma-chine learning algorithms today, and it is critical for their success. In this course you'll learn some of the techniques needed for the analysis and near-optimal tuning of such algorithms. This is an elective course, offered to MASNA students, and examples used in class may differ depending on students’ interests.

Learning Objectives

The course gives students an important foundation to develop and conduct their own research as well as to evaluate research of others.

Expected Learning Outcomes

Know the basic concepts from statistical learning theory.
Know theoretical foundation of why some machine learning algorithms are successful in a large range of applications, with special emphasis on statistics.
Be able to calculate sizes of training sets for several machine learning tasks in the context of PAC-learning (and hence calculate VC-dimensions).
Have theoretical understanding of several online learning algorithms and learning with expert advice.
Know several paradigms in statistical learning theory to select models (Structural risk minimiza-tion, Maximal likelihood, Minimal Description Length, etc.).
Know the link between cryptography and computational limitations of statistical learning.
Be able to develop and/or foster critical reviewing skills of published empirical research using applied statistical methods.
Be able to to criticize constructively and determine existing issues with applied linear models in published work .
Have a training of mathematical skills such as abstract thinking, formal thinking and problem solving;
Have in-depth understanding of boosting algorithms and a few other algorithms for machine learning.

Course Contents

Probably approximately correct learning
Several examples of simple learning algorithms are studied such as for learning of axis aligned rectangles. The general problem is formulated. Instance classes that are not learnable are dis-cussed.
VC-dimensions
We define VC-dimensions and prove the fundamental theorem of statistical learning theory: if the VCdimension of the set of classifiers is finite, then we can learn a correct classifier with any precision using finitely many samples. Moreover, the theorem gives us upperbound for the sam-ples in terms of the VCdimension and the parameters of our precision measure.
Structural risk minimization
We consider a class of problems that is learnable in the sense above, and relax our criteria for learning to what is called nonuniform learnability. We provide a characterization of classifier spaces that are learnable in this sense and again this theorem gives us sample bounds. We discuss a special framework called Minimum Description Length principle and discuss some applications.
The time complexity of learning and cryptography
A typical cryptographic encoding of a string is an example of a an object with a clear structure, but of which no learning algorithm can find the structure of the object in a reasonable time. In this part we discuss some learning tasks that are impossible for computational reasons. Under a plausible computational complexity assumption (which is required for secure RSA encryption) one can show that neural networks of small depth and regular languages can not be learned.
Boosting
A weak learning algorithm can generate from a train set a model that is slightly better than ran-dom guessing. We observe that are weak learnable coincide with the classes that are PAC-learnable and present some efficient algorithms to transform weak learners to the stronger PAC-learning algorithms. We study AdaBoost and DeepBoost, two algorithms that are currently very popular in machine learning. Furthermore we prove performance guarantees and given an alter-native explanation of their success using game theory.
Online learning
In online learning there is initially no training set. After each prediction, the class of the label is declared and the learning algorithm can use this information to imporve the prediction model. We study 1) prediction with expert advice, 2) linear classification algorithm such as the perceptron algorithm and 3) the connection between online learning and game theory.
Probabilistic formulations of prediction problems
Plug-in estimators, empirical risk minimization; linear threshold functions, perceptron algorithm, risk bounds (concertation inequalities, uniform convergence; convex surrogate losses for classi-fication).
Game-theoretic formulations of prediction problems
Minimax strategies for log loss, linear loss, and quadratic loss, universal portfolios and online convex optimization.
Neural networks
Stochastic gradient methods, combinatorial dimensions and Rademacher averages, hardness re-sults for learning, efficient learning algorithms.

Assessment Elements

Homework Assignments (5 x Varied points)
In-Class Labs (9-10 x Varied points)
Quizzes (Best 9 of 10, Varied points)
Final In-Class or Take-home exam (at the discretion of the instructor)

Interim Assessment

Interim assessment (4 module)
0.5 * Final In-Class or Take-home exam (at the discretion of the instructor) + 0.2 * Homework Assignments (5 x Varied points) + 0.2 * In-Class Labs (9-10 x Varied points) + 0.1 * Quizzes (Best 9 of 10, Varied points)

Bibliography

Recommended Core Bibliography

Alpaydin, E. (2014). Introduction to Machine Learning (Vol. Third edition). Cambridge, MA: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=836612
Harman, G., & Kulkarni, S. (2007). Reliable Reasoning : Induction and Statistical Learning Theory. Cambridge, Mass: A Bradford Book. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=189264
Kulkarni, S., Harman, G., & Wiley InterScience (Online service). (2011). An Elementary Introduction to Statistical Learning Theory. Hoboken, N.J.: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=391376
Mohri, M., Talwalkar, A., & Rostamizadeh, A. (2012). Foundations of Machine Learning. Cambridge, MA: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=478737
Murphy, K. P. (2012). Machine Learning : A Probabilistic Perspective. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=480968

Recommended Additional Bibliography

Haroon, D. (2017). Python Machine Learning Case Studies : Five Case Studies for the Data Scientist. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1623520
Lantz, B. (2013). Machine Learning with R : Learn How to Use R to Apply Powerful Machine Learning Methods and Gain an Insight Into Real-world Applications. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=656222
Mathur, P. (2019). Machine Learning Applications Using Python : Cases Studies From Healthcare, Retail, and Finance. [Berkeley, California]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1982259
Ramasubramanian, K., & Singh, A. (2017). Machine Learning Using R. [Place of publication not identified]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402990
Sarkar, D., Bali, R., & Sharma, T. (2018). Practical Machine Learning with Python : A Problem-Solver’s Guide to Building Real-World Intelligent Systems. [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1667293

Course Syllabus