Bachelor
2020/2021
Data analysis in Python
Type:
Elective course (HSE University and University of London Parallel Degree Programme in Management and Digital Innovation)
Area of studies:
Business Informatics
Delivered by:
Bachelor's Programme in Digital Product Management
Where:
Graduate School of Business
When:
3 year, 3, 4 module
Mode of studies:
offline
Instructors:
Vitaly Silchev
Language:
English
ECTS credits:
4
Contact hours:
40
Course Syllabus
Abstract
This course will introduce the learner to the basics of data analysis using Python programming language, common libraries and Jupyter Notebook environment. It covers the steps of analytical process starting from problem definition and exploratory data analysis to model selection and hyper-parameter optimisation.
Learning Objectives
- to provide on overview of available data analysis tools in Python ecosystem
- to give knowledge about data analysis pipeline
- to practice how to use analytical tools in various tasks
Expected Learning Outcomes
- understand the steps of the analytical process
- use basic Python modules for data analysis (numpy, pandas, matplotlib)
- perform exploratory data analysis
- select appropriate visualizations for data
- build predictive models for clusterisation, regression and classification tasks
- prepare dataset before training the model
- select appropriate metric for model evaluation
- tune hyper-parameters of the model
Course Contents
- Introduction to Data Analytics in PythonThis is an introductory section that describes such key areas as the analytical process, how data is created, stored, accessed, and how the organization works with data. It also covers data analysis tools available in Python ecosystem.
- Descriptive AnalyticsDescriptive analytics is a preliminary stage of data processing that includes exploratory data analysis and data visualization.
- Predictive AnalyticsThis section covers basic machine learning tasks like clustering, regression and classification. It also includes the machine learning pipeline steps from feature engineering to metric selection and hyper-parameter optimization.
Interim Assessment
- Interim assessment (4 module)0.3 * Final Test + 0.3 * Intermediate Tests + 0.4 * Programming Assignments
Bibliography
Recommended Core Bibliography
- Nelli, F. (2018). Python Data Analytics : With Pandas, NumPy, and Matplotlib (Vol. Second edition). New York, NY: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1905344
- Python for data analysis : data wrangling with pandas, numPy, and IPhython, Mckinney, W., 2017
- Sarkar, D., Bali, R., & Sharma, T. (2018). Practical Machine Learning with Python : A Problem-Solver’s Guide to Building Real-World Intelligent Systems. [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1667293
Recommended Additional Bibliography
- Rajaraman, A., & Ullman, J. D. (2012). Mining of Massive Datasets. New York, N.Y.: Cambridge University Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=408850