• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2021/2022

Анализ данных в политике и журналистике

Статус: Курс по выбору
Направление: 41.03.06. Публичная политика и социальные науки
Когда читается: 3-й курс, 1, 2 модуль
Формат изучения: без онлайн-курса
Охват аудитории: для своего кампуса
Язык: английский
Кредиты: 3
Контактные часы: 22

Course Syllabus

Abstract

In this intermediate Python course, you will learn how to apply data science methods and techniques to politics and journalism. This course will provide you with knowledge and skills in exploratory data analysis and data visualization. The practical classes are project oriented and cover the basic topics of data science applications. By the end of the course, you will be able to perform your own projects in Python.
Learning Objectives

Learning Objectives

  • To provide an introduction to Python applications in politics and journalism and enable students to conduct research in a reproducible manner.
Expected Learning Outcomes

Expected Learning Outcomes

  • ability to perform exploratory data analysis, hypothesis testing and visualization
  • Intermediate proficiency in Python libraries for data analysis and visualization (NumPy, Pandas, Matplotlib, Plotly, Scikit-Learn, etc.)
  • the knowledge and skills for implementation of own projects in Python
Course Contents

Course Contents

  • Review of Python basics, concepts and syntax for data manipulation
  • Exploratory data analysis and descriptive statistics using Python packages (Pandas, NumPy)
  • Data visualization using matplotlib, seaborn, plotly
  • Hypothesis testing (t-test, z-test, etc). Confidence intervals
  • Linear regression. Metrics for quality evaluation (MSE, RMSE, MAE, R2, etc)
  • Text preprocessing including regular expressions, stop-words, lemmatization, stemming, vectorizing, TF-IDF. Topic modelling using LDA model
  • k-Nearest Neighbours. Model selection, validation and analysis. Cross-validation, train-test split. Parameter tuning
  • Logistic regression. Metrics for quality evaluation (Accuracy, Precision, Recall, AUC-ROC, etc).
Assessment Elements

Assessment Elements

  • non-blocking Home Assignment
    The assignment is a small project with tasks, which implies the use of knowledge and skills in Python from the previous practical classes. After the deadline for the assignment, during the next week or later, each student may be offered a convenient time to participate in a conference in Zoom with a lecturer or TA to answer questions on code and explanations of solutions.
  • non-blocking In-class Midterm Lab
    This assignment consists of a set of problems to solve which it is necessary to use knowledge and Python skills mastered during the previous classes. Students will have 70 minutes to complete and submit the assignment. The grade for this assignment is valid only if a student is present in the class.
  • non-blocking Final Project
    In this assignment students are supposed to conduct a small research. The research question should be relevant to either of the following fields: political science, economics, or international relations. Group size: 2-3 students (individual projects are not allowed). The final product of your work includes presentation (5-7 minutes), code and the dataset you have used. During the defence of your project any teammate can be asked about any line of the code or a slide. If the question is not answered we reserve the right to annul this assignment for every member of the team. Group projects should comply with the following criteria: - the purpose of the study and the research question are clearly stated - research methodology and results are described in a clear and concise way - statistical properties of the data are clearly described - research outcomes are clearly defined - includes intuitive visualizations of research outcomes - contribution of all members of the team is clearly explained - code is properly structured and reproducible.
  • non-blocking In-class Final Lab
    This assignment consists of a set of problems to solve which it is necessary to use knowledge and Python skills mastered during the previous classes. Students will have 70 minutes to complete and submit the assignment. The grade for this assignment is valid only if a student is present in the class.
Interim Assessment

Interim Assessment

  • 2021/2022 2nd module
    0.21 * Home Assignment + 0.29 * Final Project + 0.21 * In-class Midterm Lab + 0.29 * In-class Final Lab
Bibliography

Bibliography

Recommended Core Bibliography

  • The data science handbook, Cady, F., 2017

Recommended Additional Bibliography

  • Döbler, M., & Grössmann, T. (2019). Data Visualization with Python : Create an Impact with Meaningful Data Insights Using Interactive and Engaging Visuals. Packt Publishing.