• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Introduction to Machine Learning

2019/2020
Academic Year
ENG
Instruction in English
4
ECTS credits
Delivered at:
Department of Innovation and Business in Information Technologies
Course type:
Elective course
When:
3 year, 3, 4 module

Instructors


Silchev, Vitaly

Course Syllabus

Abstract

When solving various business tasks, it has to deal with the need to process large amounts of information. To work with” Big data " it needs to own a variety of technologies that allows to use machine learning algorithms. Python language contains a number of built-in libraries for working with data, developing machine learning algorithms, and is supported by many modern platforms Apache Spark, Microsoft Azure, etc.
Learning Objectives

Learning Objectives

  • The objectives of the course is to develop students ' complex theoretical knowledge and methodological foundations in the field of machine learning, as well as practical skills for working with big data using Python.
Expected Learning Outcomes

Expected Learning Outcomes

  • To know: basic terms and concepts of Python language; basic terms and concepts of machine learning; To have practical skills of: using built-in Python libraries; developing machine learning algorithms in Python; solving various business tasks for processing large amounts of information. To acquire basic knowledge of: tools and modern software platforms that support the implementation of machine learning algorithms;
Course Contents

Course Contents

  • Introduction. Python basics.
    Enthought Canopy Express development environment. Basic concepts of the Python language. Run Python scripts.
  • Statistics and Probability Refresher, and Python Practise
    Data type. Expectation, median, mode, standard deviation, variance. Distribution functions, probability density. Percentiles and moments. Covariance and correlation. Conditional probability. Bayes theorem.
  • Predictive Models
    Regression Algorithms. Multilevel models.
  • Machine Learning with Python
    Training with a teacher and without a teacher. Overfitting. Bayesian methods. Clustering. Entropy change. Decision tree. Ensemble learning. SVM. K-nearest neighbor method. Dimension reduction. Principal components analysis method.
  • Recommender Systems
    User-Based Collaborative Filtering. Item-Based Collaborative Filtering
  • Dealing with Real-World Data
    Cross-validation for K blocks. Data cleaning and normalization. The detection of outliers
  • Apache Spark Machine Learning on Big Data
    The Concept Of Apache Spark. RDD. Introduction to MLLib
Assessment Elements

Assessment Elements

  • non-blocking Homework assignment
  • non-blocking Control work
  • non-blocking Activity on seminars
  • non-blocking Online-test
    The instructions for students in the LMS. 1. Midterm exams with asynchronous proctoring. Examination format: The exam is taken written (multiple choice questions) with asynchronous proctoring. Asynchronous proctoring means that all the student's actions during the exam will be “watched” by the computer. The exam process is recorded and analyzed by artificial intelligence and a human (proctor). Please be careful and follow the instructions clearly! The platform: The exam is conducted on the StartExam platform. StartExam is an online platform for conducting test tasks of various levels of complexity. The link to pass the exam task will be available to the student in the RUZ. Students are required to join a session 15 minutes before the beginning. The computers must meet the following technical requirements: https://eduhseru-my.sharepoint.com/:b:/g/personal/vsukhomlinov_hse_ru/EUhZkYaRxQRLh9bSkXKptkUBjy7gGBj39W_pwqgqqNo_aA?e=fn0t9N A student is supposed to follow the requirements below: Prepare identification documents (а passport on a page with name and photo) for identification before the beginning of the examination task; Check your microphone, speakers or headphones, webcam, Internet connection (we recommend connecting your computer to the network with a cable, if possible); Prepare the necessary writing equipment, such as pens, pencils, pieces of paper, and others. Disable applications on the computer's task other than the browser that will be used to log in to the StartExam program. If one of the necessary requirements for participation in the exam cannot be met, a student is obliged to inform a professor and a manager of a program 2 weeks before the exam date to decide on the student's participation in the exams. Students are not allowed to: Turn off the video camera; Use notes, textbooks, and other educational materials; Leave the place where the exam task is taken (go beyond the camera's viewing angle); Look away from your computer screen or desktop; Use smart gadgets (smartphone, tablet, etc.) Involve outsiders for help during the exam, talk to outsiders during the examination tasks; Read tasks out loud. Students are allowed to: Write on a piece of paper, use a pen for making notes and calculations; Use a calculator; Connection failures: A short-term communication failure during the exam is considered to be the loss of a student's network connection with the StartExam platform for no longer than 1 minute. A long-term communication failure during the exam is considered to be the loss of a student's network connection with the StartExam platform for longer than 1 minute. A long-term communication failure during the exam is the basis for the decision to terminate the exam and the rating “unsatisfactory” (0 on a ten-point scale). In case of long-term communication failure in the StartExam platform during the examination task, the student must notify the teacher, record the fact of loss of connection with the platform (screenshot, a response from the Internet provider). Then contact the manager of a program with an explanatory note about the incident to decide on retaking the exam.
  • non-blocking Attendance
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.2 * Activity on seminars + 0.1 * Attendance + 0.18 * Control work + 0.12 * Homework assignment + 0.4 * Online-test
Bibliography

Bibliography

Recommended Core Bibliography

  • Haroon, D. (2017). Python Machine Learning Case Studies : Five Case Studies for the Data Scientist. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1623520
  • Idris, I. (2016). Python Data Analysis Cookbook. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1290098

Recommended Additional Bibliography

  • Baka, B. (2017). Python Data Structures and Algorithms. Birmingham, U.K.: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1528144
  • Bill Lubanovic. (2019). Introducing Python : Modern Computing in Simple Packages. [N.p.]: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2291494
  • Vanderplas, J. T. (2016). Python Data Science Handbook : Essential Tools for Working with Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1425081
  • Vanderplas, J.T. (2016). Python data science handbook: Essential tools for working with data. Sebastopol, CA: O’Reilly Media, Inc. https://proxylibrary.hse.ru:2119/login.aspx?direct=true&db=nlebk&AN=1425081.