• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
2022/2023

Applied Data Science

Type: Mago-Lego
When: 1 module
Online hours: 12
Open to: students of one campus
Instructors: Nadezhda Kalmykova
Language: English
ECTS credits: 3
Contact hours: 20

Course Syllabus

Abstract

Data Science is the field of study that helps us to find the dependencies in data automatically. Such a technology enables to solve different problems without explicit programming of rules. Due to advances in computing and the field itself, during last decade machine learning has become an essential feature of products ranging from web-services to banks. In this course the student is going to overview the essential concepts of machine learning and then practice employing machine learning methods to solve business tasks. This course emphasizes the practical part and considers various aspects of solving real-world problems. The course content covers all the popular methods such as linear methods, gradient boosting, and clustering. Finally, the course considers the best practices of major companies leveraging the machine learning technology.
Learning Objectives

Learning Objectives

  • Learn to identify a machine learning problem to solve a business problem
  • Practice fitting models to solve essential machine learning problems such as regression and classification
  • Learn to design and to develop machine learning systems
Expected Learning Outcomes

Expected Learning Outcomes

  • Able to apply gradient boosting approach to solve classification and regression problems
  • Demonstrate main Pandas methods
  • Analyze the performance of a model and report results
  • Able to fit and interpret a Decision Tree model and k Nearest Neighbors model on a given dataset
  • Able to identify and correctly state classification, regression, and clustering problems
  • Able to fit a logistic and linear regression model on a given dataset
  • Identify the suitable metric for a machine learning system
  • Discribe the bagging approach to create an ensemble of models
  • Apply transformation of raw data into features suitable for modeling
  • Apply transformation of data to improve the accuracy of the algorithm
  • Able to reduce the dimensionality of the original data
  • Describe the main approaches for grouping similar data points
  • Describe methods and models for time series prediction
Course Contents

Course Contents

  • Exploratory data analysis with Pandas
  • Visual Data Analysis
  • Classification, Decision Trees, and k Nearest Neighbors
  • Linear Classification and Regression
  • Bagging and Random Forest
  • Feature Engineering and Feature Selection
  • Unsupervised Learning: Principal Component Analysis and Clustering
  • Time Series Analysis with Python
  • Gradient Boosting
Assessment Elements

Assessment Elements

  • non-blocking Homework 1
  • non-blocking Homework 2
  • non-blocking Exam
Interim Assessment

Interim Assessment

  • 2022/2023 1st module
    0.35 * Homework 2 + 0.3 * Exam + 0.35 * Homework 1
Bibliography

Bibliography

Recommended Core Bibliography

  • Aurélien Géron. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow : Concepts, Tools, and Techniques to Build Intelligent Systems: Vol. Second edition. O’Reilly Media.
  • Dr. Ossama Embarak. (2018). Data Analysis and Visualization Using Python : Analyze Data to Create Visualizations for BI Systems. Apress.
  • Harish Garg. (2018). Mastering Exploratory Analysis with Pandas : Build an End-to-end Data Analysis Workflow with Python. Packt Publishing.
  • James Douglas Hamilton. (2020). Time Series Analysis. Princeton University Press.
  • Müller, A. C., & Guido, S. (2017). Introduction to Machine Learning with Python : A Guide for Data Scientists: Vol. First edition. Reilly - O’Reilly Media.
  • Wei-Meng Lee. 2019. Python Machine Learning. John Wiley & Sons, Incorporated
  • Wei-Meng Lee. 2019. Python Machine Learning. John Wiley & Sons, Incorporated
  • Yang, X.-S. (2019). Introduction to Algorithms for Data Mining and Machine Learning. Academic Press.

Recommended Additional Bibliography

  • Nelli, F. (2015). Python Data Analytics : Data Analysis and Science Using Pandas, Matplotlib and the Python Programming Language. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1056488
  • Nelli, F. (2018). Python Data Analytics : With Pandas, NumPy, and Matplotlib (Vol. Second edition). New York, NY: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1905344
  • Sebastian Raschka, & Vahid Mirjalili. (2019). Python Machine Learning : Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2, 3rd Edition. Packt Publishing.