• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Master 2020/2021

Statistical Learning Theory

Area of studies: Applied Mathematics and Informatics
When: 2 year, 3 module
Mode of studies: distance learning
Instructors: Valentina Kuskova, Nada Lavrač, Ljupco Todorovski
Master’s programme: Applied Statistics with Network Analysis
Language: English
ECTS credits: 4

Course Syllabus


Statistical learning is the field that sets the framework for machine learning drawn from statistics and functional analysis. The goal of statistical learning theory is to study, in a statistical frame-work, the properties of learning algorithms. This study serves a two-fold purpose. On one hand it provides strong guarantees for existing algorithms, and on the other hand suggests new algo-rithmic approaches that are potentially more powerful. In this course we will go in detail into the theory and methods of statistical learning, and in par-ticular complexity regularization (i.e., how do you choose the complexity of your model when you have to learn it from data). This issue is at the heart of the most successful and popular ma-chine learning algorithms today, and it is critical for their success. In this course you'll learn some of the techniques needed for the analysis and near-optimal tuning of such algorithms. This is an elective course, offered to MASNA students, and examples used in class may differ depending on students’ interests.
Learning Objectives

Learning Objectives

  • The course gives students an important foundation to develop and conduct their own research as well as to evaluate research of others.
Expected Learning Outcomes

Expected Learning Outcomes

  • Know the basic concepts from statistical learning theory.
  • Know theoretical foundation of why some machine learning algorithms are successful in a large range of applications, with special emphasis on statistics.
  • Be able to calculate sizes of training sets for several machine learning tasks in the context of PAC-learning (and hence calculate VC-dimensions).
  • Have theoretical understanding of several online learning algorithms and learning with expert advice.
  • Know several paradigms in statistical learning theory to select models (Structural risk minimiza-tion, Maximal likelihood, Minimal Description Length, etc.).
  • Know the link between cryptography and computational limitations of statistical learning.
  • Be able to develop and/or foster critical reviewing skills of published empirical research using applied statistical methods.
  • Be able to to criticize constructively and determine existing issues with applied linear models in published work .
  • Have a training of mathematical skills such as abstract thinking, formal thinking and problem solving;
  • Have in-depth understanding of boosting algorithms and a few other algorithms for machine learning.
Course Contents

Course Contents

  • Probably approximately correct learning
    Several examples of simple learning algorithms are studied such as for learning of axis aligned rectangles. The general problem is formulated. Instance classes that are not learnable are dis-cussed.
  • VC-dimensions
    We define VC-dimensions and prove the fundamental theorem of statistical learning theory: if the VCdimension of the set of classifiers is finite, then we can learn a correct classifier with any precision using finitely many samples. Moreover, the theorem gives us upperbound for the sam-ples in terms of the VCdimension and the parameters of our precision measure.
  • Structural risk minimization
    We consider a class of problems that is learnable in the sense above, and relax our criteria for learning to what is called nonuniform learnability. We provide a characterization of classifier spaces that are learnable in this sense and again this theorem gives us sample bounds. We discuss a special framework called Minimum Description Length principle and discuss some applications.
  • The time complexity of learning and cryptography
    A typical cryptographic encoding of a string is an example of a an object with a clear structure, but of which no learning algorithm can find the structure of the object in a reasonable time. In this part we discuss some learning tasks that are impossible for computational reasons. Under a plausible computational complexity assumption (which is required for secure RSA encryption) one can show that neural networks of small depth and regular languages can not be learned.
  • Boosting
    A weak learning algorithm can generate from a train set a model that is slightly better than ran-dom guessing. We observe that are weak learnable coincide with the classes that are PAC-learnable and present some efficient algorithms to transform weak learners to the stronger PAC-learning algorithms. We study AdaBoost and DeepBoost, two algorithms that are currently very popular in machine learning. Furthermore we prove performance guarantees and given an alter-native explanation of their success using game theory.
  • Online learning
    In online learning there is initially no training set. After each prediction, the class of the label is declared and the learning algorithm can use this information to imporve the prediction model. We study 1) prediction with expert advice, 2) linear classification algorithm such as the perceptron algorithm and 3) the connection between online learning and game theory.
  • Probabilistic formulations of prediction problems
    Plug-in estimators, empirical risk minimization; linear threshold functions, perceptron algorithm, risk bounds (concertation inequalities, uniform convergence; convex surrogate losses for classi-fication).
  • Game-theoretic formulations of prediction problems
    Minimax strategies for log loss, linear loss, and quadratic loss, universal portfolios and online convex optimization.
  • Neural networks
    Stochastic gradient methods, combinatorial dimensions and Rademacher averages, hardness re-sults for learning, efficient learning algorithms.
Assessment Elements

Assessment Elements

  • non-blocking Homework Assignments (5 x Varied points)
  • non-blocking In-Class Labs (9-10 x Varied points)
  • non-blocking Quizzes (Best 9 of 10, Varied points)
  • non-blocking Final In-Class or Take-home exam (at the discretion of the instructor)
Interim Assessment

Interim Assessment

  • Interim assessment (3 module)
    0.5 * Final In-Class or Take-home exam (at the discretion of the instructor) + 0.2 * Homework Assignments (5 x Varied points) + 0.2 * In-Class Labs (9-10 x Varied points) + 0.1 * Quizzes (Best 9 of 10, Varied points)


Recommended Core Bibliography

  • Alpaydin, E. (2014). Introduction to Machine Learning (Vol. Third edition). Cambridge, MA: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=836612
  • Harman, G., & Kulkarni, S. (2007). Reliable Reasoning : Induction and Statistical Learning Theory. Cambridge, Mass: A Bradford Book. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=189264
  • Kulkarni, S., Harman, G., & Wiley InterScience (Online service). (2011). An Elementary Introduction to Statistical Learning Theory. Hoboken, N.J.: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=391376
  • Mohri, M., Talwalkar, A., & Rostamizadeh, A. (2012). Foundations of Machine Learning. Cambridge, MA: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=478737
  • Murphy, K. P. (2012). Machine Learning : A Probabilistic Perspective. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=480968

Recommended Additional Bibliography

  • Haroon, D. (2017). Python Machine Learning Case Studies : Five Case Studies for the Data Scientist. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1623520
  • Lantz, B. (2013). Machine Learning with R : Learn How to Use R to Apply Powerful Machine Learning Methods and Gain an Insight Into Real-world Applications. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=656222
  • Mathur, P. (2019). Machine Learning Applications Using Python : Cases Studies From Healthcare, Retail, and Finance. [Berkeley, California]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1982259
  • Ramasubramanian, K., & Singh, A. (2017). Machine Learning Using R. [Place of publication not identified]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402990
  • Sarkar, D., Bali, R., & Sharma, T. (2018). Practical Machine Learning with Python : A Problem-Solver’s Guide to Building Real-World Intelligent Systems. [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1667293