• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Data Science and Machine Learning with Python

2019/2020
Academic Year
ENG
Instruction in English
4
ECTS credits
Delivered at:
Department of Information Systems and Digital Infrastructure Management
Course type:
Elective course
When:
3 year, 1, 2 module

Instructor

Course Syllabus

Abstract

When solving various business tasks, it has to deal with the need to process large amounts of information. To work with Big data it needs to own a variety of technologies that allows to use machine learning algorithms. Python language contains a number of built-in libraries for working with data, developing machine learning algorithms, and is supported by many modern platforms Apache Spark, Microsoft Azure, etc.
Learning Objectives

Learning Objectives

  • The objectives of the course is to develop students ' complex theoretical knowledge and methodological foundations in the field of machine learning, as well as practical skills for working with big data using Python.
Expected Learning Outcomes

Expected Learning Outcomes

  • To know: basic terms and concepts of Python language; basic terms and concepts of machine learning; To have practical skills of: using built-in Python libraries; developing machine learning algorithms in Python; solving various business tasks for processing large amounts of information. To acquire basic knowledge of: tools and modern software platforms that support the implementation of machine learning algorithms;
Course Contents

Course Contents

  • Introduction. Python basics.
    Enthought Canopy Express development environment. Basic concepts of the Python language. Run Python scripts.
  • Statistics and Probability Refresher, and Python Practise
    Data type. Expectation, median, mode, standard deviation, variance. Distribution functions, probability density. Percentiles and moments. Covariance and correlation. Conditional probability. Bayes theorem.
  • Predictive Models
    Regression Algorithms. Multilevel models.
  • Machine Learning with Python
    Training with a teacher and without a teacher. Overfitting. Bayesian methods. Clustering. Entropy change. Decision tree. Ensemble learning. SVM. K-nearest neighbor method. Dimension reduction. Principal components analysis method.
  • Recommender Systems
    User-Based Collaborative Filtering. Item-Based Collaborative Filtering
  • Dealing with Real-World Data
    Cross-validation for K blocks. Data cleaning and normalization. The detection of outliers
  • Apache Spark Machine Learning on Big Data
    The Concept Of Apache Spark. RDD. Introduction to MLLib
Assessment Elements

Assessment Elements

  • non-blocking Homework
  • non-blocking Control work
  • non-blocking Oral exam
  • non-blocking Activity on seminars
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.2 * Activity on seminars + 0.18 * Control work + 0.12 * Homework + 0.5 * Oral exam
Bibliography

Bibliography

Recommended Core Bibliography

  • Haroon, D. (2017). Python Machine Learning Case Studies : Five Case Studies for the Data Scientist. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1623520
  • Idris, I. (2016). Python Data Analysis Cookbook. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1290098

Recommended Additional Bibliography

  • Baka, B. (2017). Python Data Structures and Algorithms. Birmingham, U.K.: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1528144
  • Bill Lubanovic. (2019). Introducing Python : Modern Computing in Simple Packages. [N.p.]: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2291494
  • Vanderplas, J. T. (2016). Python Data Science Handbook : Essential Tools for Working with Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1425081
  • Vanderplas, J.T. (2016). Python data science handbook: Essential tools for working with data. Sebastopol, CA: O’Reilly Media, Inc. https://proxylibrary.hse.ru:2119/login.aspx?direct=true&db=nlebk&AN=1425081.