• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Introduction to Data Science in Python

2018/2019
Academic Year
ENG
Instruction in English
2
ECTS credits
Delivered at:
eLearning Office
Course type:
Elective course
When:
2 year, 2 module

Course Syllabus

Abstract

This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the Series and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as groupby, merge, and pivot tables effectively. By the end of this course, students will be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses. The course is a Massive Open Online Course delivered at Coursera platform (https://www.coursera.org/learn/python-data-analysis). Students are required to attend the course and take an oral examination at HSE for completing the course. The examination is taken after completion of the course during examination weeks. The full syllabus is published at the course website. (https://www.coursera.org/learn/python-data-analysis). Only for students of Comparative Social Research programme
Learning Objectives

Learning Objectives

  • to introduce the learner to the basics of the python programming environment
  • to introduce the abstraction of the Series and DataFrame
  • to be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses
Expected Learning Outcomes

Expected Learning Outcomes

  • to describe common Python functionality and features used for data science
  • Query DataFrame structures for cleaning and processing
  • Explain distributions, sampling, and t-tests
  • Understand techniques such as lambdas and manipulating csv files
Course Contents

Course Contents

  • Week 1
    In this week you'll get an introduction to the field of data science, review common Python functionality and features which data scientists use, and be introduced to the Coursera Jupyter Notebook for the lectures. All of the course information on grading, prerequisites, and expectations are on the course syllabus, and you can find more information about the Jupyter Notebooks on our Course Resources page
  • Week 2
    In this week of the course you'll learn the fundamentals of one of the most important toolkits Python has for data cleaning and processing -- pandas. You'll learn how to read in data into DataFrame structures, how to query these structures, and the details about such structures are indexed. The module ends with a programming assignment and a discussion question.
  • Week 3
    In this week you'll deepen your understanding of the python pandas library by learning how to merge DataFrames, generate summary tables, group data into logical pieces, and manipulate dates. We'll also refresh your understanding of scales of data, and discuss issues with creating metrics for analysis. The week ends with a more significant programming assignment.
  • Week 4
    In this week of the course you'll be introduced to a variety of statistical techniques such a distributions, sampling and t-tests. The majority of the week will be dedicated to your course project, where you'll engage in a real-world data cleaning activity and provide evidence for (or against!) a given hypothesis. This project is suitable for a data science portfolio, and will test your knowledge of cleaning, merging, manipulating, and test for significance in data. The week ends with two discussions of science and the rise of the fourth paradigm -- data driven discovery.
Assessment Elements

Assessment Elements

  • Partially blocks (final) grade/grade calculation MOOC Certificate
  • non-blocking Oral exam
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    After attending the MOOC it is required to present the final results (certificate or another document - C). The document has to be submitted to the study office immediately after completion of the course. After successful completion of the course an examination is undertaken. Prerequisite for attending the examination is submission of the certificate to the study office. The examination grade (E) is the final grade for the course. Final control: oral group exam. The overall course grade (G) (10-point scale) is calculated as a sum of G = C*0.7+ E*0.3
Bibliography

Bibliography

Recommended Core Bibliography

  • Python for data analysis : data wrangling with pandas, numPy, and IPhython, Mckinney, W., 2017
  • Vanderplas, J. T. (2016). Python Data Science Handbook : Essential Tools for Working with Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1425081
  • Изучаем Python, Лутц, М., 2014

Recommended Additional Bibliography

  • Sarkar, D. Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data [Электронный ресурс] / Dipanjan Sarkar; БД Books 24x7. – Chicago: Apress, 2016. – 412 p. – ISBN 978-1-4842-2387-1