• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Data analysis in Python

2020/2021
Academic Year
ENG
Instruction in English
4
ECTS credits
Course type:
Elective course
When:
3 year, 3, 4 module

Instructor


Silchev, Vitaly

Course Syllabus

Abstract

This course will introduce the learner to the basics of data analysis using Python programming language, common libraries and Jupyter Notebook environment. It covers the steps of analytical process starting from problem definition and exploratory data analysis to model selection and hyper-parameter optimisation.
Learning Objectives

Learning Objectives

  • to provide on overview of available data analysis tools in Python ecosystem
  • to give knowledge about data analysis pipeline
  • to practice how to use analytical tools in various tasks
Expected Learning Outcomes

Expected Learning Outcomes

  • understand the steps of the analytical process
  • use basic Python modules for data analysis (numpy, pandas, matplotlib)
  • perform exploratory data analysis
  • select appropriate visualizations for data
  • build predictive models for clusterisation, regression and classification tasks
  • prepare dataset before training the model
  • select appropriate metric for model evaluation
  • tune hyper-parameters of the model
Course Contents

Course Contents

  • Introduction to Data Analytics in Python
    This is an introductory section that describes such key areas as the analytical process, how data is created, stored, accessed, and how the organization works with data. It also covers data analysis tools available in Python ecosystem.
  • Descriptive Analytics
    Descriptive analytics is a preliminary stage of data processing that includes exploratory data analysis and data visualization.
  • Predictive Analytics
    This section covers basic machine learning tasks like clustering, regression and classification. It also includes the machine learning pipeline steps from feature engineering to metric selection and hyper-parameter optimization.
Assessment Elements

Assessment Elements

  • non-blocking Final Test
  • non-blocking Intermediate Tests
  • non-blocking Programming Assignments
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.3 * Final Test + 0.3 * Intermediate Tests + 0.4 * Programming Assignments
Bibliography

Bibliography

Recommended Core Bibliography

  • Nelli, F. (2018). Python Data Analytics : With Pandas, NumPy, and Matplotlib (Vol. Second edition). New York, NY: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1905344
  • Python for data analysis : data wrangling with pandas, numPy, and IPhython, Mckinney, W., 2017
  • Sarkar, D., Bali, R., & Sharma, T. (2018). Practical Machine Learning with Python : A Problem-Solver’s Guide to Building Real-World Intelligent Systems. [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1667293

Recommended Additional Bibliography

  • Rajaraman, A., & Ullman, J. D. (2012). Mining of Massive Datasets. New York, N.Y.: Cambridge University Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=408850