• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2019/2020

Анализ данных в политике и журналистике

Направление: 41.03.05. Международные отношения
Когда читается: 3-й курс, 1 модуль
Формат изучения: Full time
Язык: английский
Кредиты: 3

Программа дисциплины

Аннотация

In this media-oriented intermediate R course, you will learn how to apply graphs and network theory to social media, tokenize and quantitatively assess publications and expressions. and use mathematical methods to enhance your modeling skills. By the end of the course, you will be familiar with the basics of manipulating datasets to perform analytics in R.
Цель освоения дисциплины

Цель освоения дисциплины

  • To provide an introduction to applications of R in journalism and political science and enable students to carry out research in a reproducible fashion.
Результаты освоения дисциплины

Результаты освоения дисциплины

  • Skill of using tidyverse, ggplot2
  • Skill of interpreting text and media as big data.
  • Skill of detecting the points of the most and the least optimistic SOTUs and to compare sentiment of speeches by political party.
  • Skill of using the ‘rgraph’ library to manipulate a network structure.
  • Skill of visualizing the network structure of VK friends.
  • Skill of using logistic regression.
  • Skill of evaluation of model predictive accuracy.
Содержание учебной дисциплины

Содержание учебной дисциплины

  • Review of the basic data manipulation and visualization R packages: tidyverse, ggplot2. Summary statistics of a dataset.
  • Text and media as big data. Concepts of structuring text and assessing the sentiment of an expression.
  • Analyzing State of the Union speeches of all US presidents. Detecting the points of the most and the least optimistic SOTUs. Comparing sentiment of speeches by political party. Graphing the results.
  • Introduction to graph theory. The Euler’s ‘Seven bridges of Koenigsberg’ problem. Using the ‘rgraph’ library to manipulate a network structure. Assigning additional properties to edges and vertices.
  • Visualizing the network structure of VK friends. Accessing VK account from R via API, obtaining individual account data and building a graph. Plotting and labeling the result.
  • Introduction to logistic regression. Evaluation of model significance. P-value, confidence intervals, pseudo-R-squared.
  • Evaluation of model predictive accuracy. Contingency table. ROC – curve. Selecting an optimal separation threshold.
Элементы контроля

Элементы контроля

  • неблокирующий Created with Sketch. Problem set 1
  • неблокирующий Created with Sketch. Problem set 2
  • неблокирующий Created with Sketch. Presentation of the group project
Промежуточная аттестация

Промежуточная аттестация

  • Промежуточная аттестация (1 модуль)
    0.6 * Presentation of the group project + 0.2 * Problem set 1 + 0.2 * Problem set 2
Список литературы

Список литературы

Рекомендуемая основная литература

  • Garrett, N. (2015). Textbooks for Responsible Data Analysis in Excel. Journal of Education for Business, 90(4), 169–174. https://doi.org/10.1080/08832323.2015.1007908

Рекомендуемая дополнительная литература

  • De-Arteaga, M., & Boecking, B. (2019). Killings of social leaders in the Colombian post-conflict: Data analysis for investigative journalism. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1906.08206
  • Houston, B., & Houston, B. (2019). Data for Journalists : A Practical Guide for Computer-Assisted Reporting (Vol. Fifth edition). New York, NY: Routledge. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1989291
  • Iyoob, I. (2019). Data science vs. operations research: A comparison: Machine learning is more popular today yet it still includes OR algorithms. ISE: Industrial & Systems Engineering at Work, 51(12), 42. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=f5h&AN=139715696