• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Bachelor 2019/2020

Data Analysis in Politics and Journalism

Area of studies: International Relations
When: 3 year, 1 module
Mode of studies: offline
Instructors: Nick Korzhenevsky
Language: English
ECTS credits: 3
Contact hours: 20

Course Syllabus

Abstract

In this media-oriented intermediate R course, you will learn how to apply graphs and network theory to social media, tokenize and quantitatively assess publications and expressions. and use mathematical methods to enhance your modeling skills. By the end of the course, you will be familiar with the basics of manipulating datasets to perform analytics in R.
Learning Objectives

Learning Objectives

  • To provide an introduction to applications of R in journalism and political science and enable students to carry out research in a reproducible fashion.
Expected Learning Outcomes

Expected Learning Outcomes

  • Skill of using tidyverse, ggplot2
  • Skill of interpreting text and media as big data.
  • Skill of detecting the points of the most and the least optimistic SOTUs and to compare sentiment of speeches by political party.
  • Skill of using the ‘rgraph’ library to manipulate a network structure.
  • Skill of visualizing the network structure of VK friends.
  • Skill of using logistic regression.
  • Skill of evaluation of model predictive accuracy.
Course Contents

Course Contents

  • Review of the basic data manipulation and visualization R packages: tidyverse, ggplot2. Summary statistics of a dataset.
  • Text and media as big data. Concepts of structuring text and assessing the sentiment of an expression.
  • Analyzing State of the Union speeches of all US presidents. Detecting the points of the most and the least optimistic SOTUs. Comparing sentiment of speeches by political party. Graphing the results.
  • Introduction to graph theory. The Euler’s ‘Seven bridges of Koenigsberg’ problem. Using the ‘rgraph’ library to manipulate a network structure. Assigning additional properties to edges and vertices.
  • Visualizing the network structure of VK friends. Accessing VK account from R via API, obtaining individual account data and building a graph. Plotting and labeling the result.
  • Introduction to logistic regression. Evaluation of model significance. P-value, confidence intervals, pseudo-R-squared.
  • Evaluation of model predictive accuracy. Contingency table. ROC – curve. Selecting an optimal separation threshold.
Assessment Elements

Assessment Elements

  • non-blocking Problem set 1
  • non-blocking Problem set 2
  • non-blocking Presentation of the group project
Interim Assessment

Interim Assessment

  • Interim assessment (1 module)
    0.6 * Presentation of the group project + 0.2 * Problem set 1 + 0.2 * Problem set 2
Bibliography

Bibliography

Recommended Core Bibliography

  • Garrett, N. (2015). Textbooks for Responsible Data Analysis in Excel. Journal of Education for Business, 90(4), 169–174. https://doi.org/10.1080/08832323.2015.1007908

Recommended Additional Bibliography

  • De-Arteaga, M., & Boecking, B. (2019). Killings of social leaders in the Colombian post-conflict: Data analysis for investigative journalism. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1906.08206
  • Houston, B., & Houston, B. (2019). Data for Journalists : A Practical Guide for Computer-Assisted Reporting (Vol. Fifth edition). New York, NY: Routledge. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1989291
  • Iyoob, I. (2019). Data science vs. operations research: A comparison: Machine learning is more popular today yet it still includes OR algorithms. ISE: Industrial & Systems Engineering at Work, 51(12), 42. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=f5h&AN=139715696