• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2019/2020

Анализ данных в Python

Направление: 41.03.05. Международные отношения
Когда читается: 2-й курс, 1-4 модуль
Формат изучения: Blended
Преподаватели: Камротов Михаил Владимирович
Язык: английский
Кредиты: 4

Программа дисциплины

Аннотация

In this course students are introduced to the rapidly growing field of data analytics with the specific focus on Python programming language. Students will learn concepts, techniques and tools they need to make meaningful inferences from data. Students will be exposed to a real-world data sets to gain practical skills in data manipulations. Each week will involve seminars and coding simulations. In the final project students will build a working code that can be readily applied for exploratory data analysis in their own (future) research domain. To prepare for the class students will have to take the “Programming for Everybody (Getting Started with Python)” course of the University of Michigan at https://www.edx.org/course/programming-for-everybody-getting-started-with-python.
Цель освоения дисциплины

Цель освоения дисциплины

  • To provide a hands-on introduction to Python and its basic applications in the field of data science
Результаты освоения дисциплины

Результаты освоения дисциплины

  • Basic knowledge about the field of data science.
  • Skill of using NumPy, SciPy, Jupyter notebooks.
  • Skill of applying principles of tidy data.
  • Skill of computing descriptive statistics
  • Skill of visualizing data in Python.
  • Skill of creating interactive plots in plotly
  • Skill of applying hypotheses testing and statistical inference.
  • Skill of using linear regression.
  • Skill of applying rolling estimation technique.
  • Skill of measuring predictive accuracy of the model.
  • Select research question for the group project
Содержание учебной дисциплины

Содержание учебной дисциплины

  • Introduction to the field of data science. Examples of data science approaches applied in economics and political science. Course information on grading, prerequisites and expectations.
  • Introduction to Python. Review of the environment setup process. Anaconda IDE. NumPy, SciPy, Jupyter notebooks.
  • Importing data to Python. Various data sources: text files, web, APIs. Raw and processed data. Working with dates. Pandas library. Merging DataFrames. The principles of tidy data. Data Quality: inaccurate data; sparse data; missing data; insufficient data; imbalanced data.
  • Descriptive statistics. Measures of location: mean, median, mode. Measures of spread: standard deviation, interquartile range, range. Percentiles. Robust statistics. Data transformations.
  • Visualizing data in Python. Matplotlib library. Scatter plot. Line chart. Histogram. Bar chart. Categorical, times series and statistical data graphics.
  • Interactive plots in Python. Introduction to plotly. Finding suitable representation of the data.
  • Basics of probability theory. Distributions, sampling, t-tests. Introductory hypotheses testing and statistical inference.
  • Introduction to linear regression. Estimation techniques. Evaluating the quality of the regression model. Model interpretation.
  • Drawbacks of the linear regression approach. Stability of the coefficients across different parts of the dataset. Rolling estimations.
  • Overfitting. Occam’s Razor principle. In-sample and out-of-sample model evaluation. Measuring predictive accuracy of the model.
  • Class wrap-up and discussion of the group project.
Элементы контроля

Элементы контроля

  • неблокирующий Created with Sketch. Problem set 1
  • неблокирующий Created with Sketch. Problem set 2
  • неблокирующий Created with Sketch. Problem set 3
  • неблокирующий Created with Sketch. Presentation of the group project
Промежуточная аттестация

Промежуточная аттестация

  • Промежуточная аттестация (4 модуль)
    0.4 * Presentation of the group project + 0.2 * Problem set 1 + 0.2 * Problem set 2 + 0.2 * Problem set 3