Introduction to Machine Learning
- The objectives of the course is to develop students ' complex theoretical knowledge and methodological foundations in the field of machine learning, as well as practical skills for working with big data using Python.
- To know: basic terms and concepts of Python language; basic terms and concepts of machine learning; To have practical skills of: using built-in Python libraries; developing machine learning algorithms in Python; solving various business tasks for processing large amounts of information. To acquire basic knowledge of: tools and modern software platforms that support the implementation of machine learning algorithms;
- Introduction. Python basics.Enthought Canopy Express development environment. Basic concepts of the Python language. Run Python scripts.
- Statistics and Probability Refresher, and Python PractiseData type. Expectation, median, mode, standard deviation, variance. Distribution functions, probability density. Percentiles and moments. Covariance and correlation. Conditional probability. Bayes theorem.
- Predictive ModelsRegression Algorithms. Multilevel models.
- Machine Learning with PythonTraining with a teacher and without a teacher. Overfitting. Bayesian methods. Clustering. Entropy change. Decision tree. Ensemble learning. SVM. K-nearest neighbor method. Dimension reduction. Principal components analysis method.
- Recommender SystemsUser-Based Collaborative Filtering. Item-Based Collaborative Filtering
- Dealing with Real-World DataCross-validation for K blocks. Data cleaning and normalization. The detection of outliers
- Apache Spark Machine Learning on Big DataThe Concept Of Apache Spark. RDD. Introduction to MLLib
- Homework assignment
- Control work
- Activity on seminars
- Online-testThe instructions for students in the LMS. 1. Midterm exams with asynchronous proctoring. Examination format: The exam is taken written (multiple choice questions) with asynchronous proctoring. Asynchronous proctoring means that all the student's actions during the exam will be “watched” by the computer. The exam process is recorded and analyzed by artificial intelligence and a human (proctor). Please be careful and follow the instructions clearly! The platform: The exam is conducted on the StartExam platform. StartExam is an online platform for conducting test tasks of various levels of complexity. The link to pass the exam task will be available to the student in the RUZ. Students are required to join a session 15 minutes before the beginning. The computers must meet the following technical requirements: https://eduhseru-my.sharepoint.com/:b:/g/personal/vsukhomlinov_hse_ru/EUhZkYaRxQRLh9bSkXKptkUBjy7gGBj39W_pwqgqqNo_aA?e=fn0t9N A student is supposed to follow the requirements below: Prepare identification documents (а passport on a page with name and photo) for identification before the beginning of the examination task; Check your microphone, speakers or headphones, webcam, Internet connection (we recommend connecting your computer to the network with a cable, if possible); Prepare the necessary writing equipment, such as pens, pencils, pieces of paper, and others. Disable applications on the computer's task other than the browser that will be used to log in to the StartExam program. If one of the necessary requirements for participation in the exam cannot be met, a student is obliged to inform a professor and a manager of a program 2 weeks before the exam date to decide on the student's participation in the exams. Students are not allowed to: Turn off the video camera; Use notes, textbooks, and other educational materials; Leave the place where the exam task is taken (go beyond the camera's viewing angle); Look away from your computer screen or desktop; Use smart gadgets (smartphone, tablet, etc.) Involve outsiders for help during the exam, talk to outsiders during the examination tasks; Read tasks out loud. Students are allowed to: Write on a piece of paper, use a pen for making notes and calculations; Use a calculator; Connection failures: A short-term communication failure during the exam is considered to be the loss of a student's network connection with the StartExam platform for no longer than 1 minute. A long-term communication failure during the exam is considered to be the loss of a student's network connection with the StartExam platform for longer than 1 minute. A long-term communication failure during the exam is the basis for the decision to terminate the exam and the rating “unsatisfactory” (0 on a ten-point scale). In case of long-term communication failure in the StartExam platform during the examination task, the student must notify the teacher, record the fact of loss of connection with the platform (screenshot, a response from the Internet provider). Then contact the manager of a program with an explanatory note about the incident to decide on retaking the exam.
- Interim assessment (4 module)0.2 * Activity on seminars + 0.1 * Attendance + 0.18 * Control work + 0.12 * Homework assignment + 0.4 * Online-test
- Haroon, D. (2017). Python Machine Learning Case Studies : Five Case Studies for the Data Scientist. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1623520
- Idris, I. (2016). Python Data Analysis Cookbook. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1290098
- Baka, B. (2017). Python Data Structures and Algorithms. Birmingham, U.K.: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1528144
- Bill Lubanovic. (2019). Introducing Python : Modern Computing in Simple Packages. [N.p.]: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2291494
- Vanderplas, J. T. (2016). Python Data Science Handbook : Essential Tools for Working with Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1425081
- Vanderplas, J.T. (2016). Python data science handbook: Essential tools for working with data. Sebastopol, CA: O’Reilly Media, Inc. https://proxylibrary.hse.ru:2119/login.aspx?direct=true&db=nlebk&AN=1425081.