Delivered at:: Big Data and Information Retrieval School

Course type:: Compulsory course

When:: 1 year, 3 module

Instructor

Sanochkin, Yuriy

Full Syllabus

Abstract

Exploration of Data Science requires a certain background in probability and statistics. This course introduces you to the necessary sections of probability theory, guiding you from the very basics all the way up to the level required for jump starting your ascent in Data Science. The core concept of the course is random variable — i.e. variable whose values are determined by random experiment. Random variables are used as a model for data generation processes we want to study. Properties of the data are deeply linked to the corresponding properties of random variables, such as expected value, variance and correlations. Dependencies between random variables are crucial factors that allow us to predict unknown quantities based on known values, which forms the basis of supervised machine learning. We begin with the notion of independent events and conditional probability, then introduce two main classes of random variables: discrete and continuous and study their properties. We'll discuss law of large numbers and central limit theorems that are crucial for statistics. Finally, we'll build our own classification algorithm based on probabilistic models. While introducing you to the theory, we'll pay special attention to practical aspects for working with probabilities, including probabilistic simulations with Python. This course requires basic knowledge in Discrete mathematics (combinatorics), Calculus (derivatives, integrals, a bit of limits) and basic Python programming skills.

Learning Objectives

The aim of the course is to introduce notions of probability theory that are used in machine learning.
Upon completion of this course students would be able to: - express real life problems in terms of events, probabilities, random variables; - use law of total probability and Bayes' rule; analyze discrete and continuous random variables; - understand different ways to define a random variable; - study relations between random variables; - model and study random variables using Python.

Expected Learning Outcomes

Explain notions of conditional probability and independence of events, describe Bernoulli scheme and understand the law of total probability and Bayes’s rule.
Find conditional and unconditional probabilities and check events for independence.
Solve probabilistic problems using the law of total probability and Bayes’s rule.
Understand notions and elementary properties of discrete random variables, expected value and variance.
Calculate expected value, variance, probability distribution and probability mass function.
Generate discrete random variables and visualize them with Python.
Know and explain Bernoulli, Binomial, Geometric and Poisson distributions.
Interpret applied problems as probabilistic models, gain certain level of intuition about them.
Calculate joint PMF, marginal distributions, check for independence, find covariance and correlation.
Understand notions of covariance and correlation.
Understand properties of expected value and variance with respect to arithmetic operations over random variables in a system.
Know what’s the system of random variables, be able to provide examples of such systems.
Understand what’s the joint probability distribution and marginal distributions, define independence of random variables.
Understand notion of continuous random variable, CDF and PDF, independence, covariance, correlation.
Understand notion of joint CDF and PDF for systems of random variables.
Distinguish between discrete, continuous and mixed random variables.
Know definition of expected value and variance of continuous random variable.
Know statement of law of large numbers.
Know statement of central limit theorem.
Know properties of normal distribution.
Know and apply Chebyshev's inequality.
Know definition of multinomial distribution.
Apply notions of conditional probability, random variables and their distributions to design classifier algorithm.
Apply Bayes' rule to select the most plausible hypothesis.

Course Contents

1. Conditional probability and independence of events
2. Random variables
3. Systems of random variables. Properties of expected value and variance
4. Continuous random variables
5. Law of large numbers and central limit theorem
6. Practical project: constructing Bayesian classifier

Assessment Elements

Quizzes
Weekly quizzes.
Staff Graded Assignment
2 SGAs during the course
Final Project

Interim Assessment

2023/2024 3rd module
0.4 * Final Project + 0.2 * Quizzes + 0.4 * Staff Graded Assignment

Bibliography

Recommended Core Bibliography

Matloff, N. S. (2020). Probability and Statistics for Data Science : Math + R + Data. Chapman and Hall/CRC.

Recommended Additional Bibliography

Linde, W. (2017). Probability Theory : A First Course in Probability Theory and Statistics. [N.p.]: De Gruyter. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1438416

Master’s Programme 'Master of Data Science'

Probability Theory