• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2022/2023

Анализ данных в Python

Статус: Курс обязательный (Химия)
Направление: 04.03.01. Химия
Кто читает: Факультет химии
Где читается: Факультет химии
Когда читается: 3-й курс, 1 модуль
Формат изучения: с онлайн-курсом
Онлайн-часы: 100
Охват аудитории: для своего кампуса
Преподаватели: Тарасенко Георгий Константинович
Язык: английский
Кредиты: 4
Контактные часы: 32

Course Syllabus

Abstract

The course is aimed to introduce data analysis using Python. The first part of the course is dedicated to the basics of Python where the topics related to the basics of this programming language are covered. The second part of the course introduces the work with real-life data within social sciences and international relations. The course is specifically designed for people with no prior experience in programming.
Learning Objectives

Learning Objectives

  • To provide a hands-on introduction to Python and its basic applications in the field of data science.
Expected Learning Outcomes

Expected Learning Outcomes

  • Basic knowledge about the field of data science.
  • Select research question for the group project
  • Skill of applying hypotheses testing and statistical inference.
  • Skill of applying principles of tidy data.
  • Skill of computing descriptive statistics
  • Skill of using NumPy, SciPy, Jupyter notebooks.
  • Skill of visualizing data in Python.
Course Contents

Course Contents

  • Introduction to the field of data science. Examples of data science approaches applied in economics and political science. Course information on grading, prerequisites and expectations.
  • Introduction to Python. Review of the environment setup process. Anaconda IDE. NumPy, SciPy, Jupyter notebooks.
  • Importing data to Python. Various data sources: text files, web, APIs. Raw and processed data. Working with dates. Pandas library. Merging DataFrames. The principles of tidy data. Data Quality: inaccurate data; sparse data; missing data; insufficient data; imbalanced data.
  • Descriptive statistics. Measures of location: mean, median, mode. Measures of spread: standard deviation, interquartile range, range. Percentiles. Robust statistics. Data transformations.
  • Visualizing data in Python. Matplotlib library. Scatter plot. Line chart. Histogram. Bar chart. Categorical, times series and statistical data graphics.
  • Interactive plots in Python. Introduction to plotly. Finding suitable representation of the data.
  • Basics of probability theory. Distributions, sampling, t-tests. Introductory hypotheses testing and statistical inference.
  • Introduction to linear regression. Estimation techniques. Evaluating the quality of the regression model. Model interpretation.
  • Drawbacks of the linear regression approach. Stability of the coefficients across different parts of the dataset. Rolling estimations.
  • Overfitting. Occam’s Razor principle. In-sample and out-of-sample model evaluation. Measuring predictive accuracy of the model.
  • Class wrap-up and discussion of the group project.
Assessment Elements

Assessment Elements

  • non-blocking Домашнее задание
  • non-blocking Финальный проект
Interim Assessment

Interim Assessment

  • 2022/2023 1st module
    0.4 * Домашнее задание + 0.6 * Финальный проект
Bibliography

Bibliography

Recommended Core Bibliography

  • An introduction to data analysis with 'R’ ; Introduction à l’analyse de données avec le logiciel R. (2019). Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.BE2A1501

Recommended Additional Bibliography

  • Luke, D. A. . V. (DE-588)130032344, (DE-627)488060184, (DE-576)297960504, aut. (2015). A user’s guide to network analysis in R Douglas A. Luke. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edswao&AN=edswao.454121474
  • Text analysis in R. (2017). Communication Methods and Measures, 11(4), 245–265. https://doi.org/10.1080/19312458.2017.1387238