Introduction to collection and analysis of 'Big data'
- Study of basic notions of Big data research
- Use of basic techniques to gather Big data and analyze it
- Know basic concepts of Big data, its opportunities, limitations, and relevance to social sciences
- Know basic concepts of R programming language
- Have skills to write R code for basic data analysis tasks
- Have skills to scrap online data through various API, automatization of actions in browser etc
- Have skills to analyze textual data
- Analysis of textual data in RBasic concepts of Text mining. Types of Text mining. Packages (qdap, stringi, stringr, tm, quanteda, NLP etc.). Text preprocessing. Term frequency analysis. Keywords analysis. Sentiment analysis. Topic analysis. Document clustering and classification. Introduction to advanced models (text2vec etc.). Visualization.
- Introduction to RWhat is R. Comparisons between R and SPSS, R and Stata, R and Python. Packages. Files. Variables. Data storage in R (vectors, lists, data frames etc.). Regular expressions. Conditions. Loops. Functions. Tidyverse in R. Limitations of R.
- Introduction to Big dataWhat is Big data. Different understandings of the notion, its opportunities and limitations. Big data applications in various types of social studies. Cases. Biases. Ethical concerns.
- Data scraping in RBasic information on web data (HTML, XML, HTTP, AJAX etc.). Data retrieval via APIs. Packages in R for social media's APIs (Twitter, Facebook, Vkontakte etc.). Limitations of APIs. Various scenarios for data retrieval without APIs. Packages in R for data retrieval without APIs (rvest, httr etc.). Automatization of actions in browser for scraping dynamic pages (with RSelenium package). Cleaning data.
- Class Attendance
- Class Participation
- Home assignment 1Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- Home assignment 2Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- EssayIn the essay a group of students (up to 4) should scrap and analyze online data from various sources on a chosen topic (for instance, news coverage of an event), and report it in a coherent text with introduction (research question, short literature review, and main hypotheses), main body (analysis), conclusion, list of references, and R script in appendix. The length of an essay should be at least 8000 characters without appendix
- Interim assessment (2 module)0.12 * Class Attendance + 0.13 * Class Participation + 0.45 * Essay + 0.15 * Home assignment 1 + 0.15 * Home assignment 2
- - Роберт И., Кабаков — R в действии. Анализ и визуализация данных в программе R - Издательство "ДМК Пресс" - 2014 - ISBN: 978-5-97060-077-1 - Текст электронный // ЭБС Лань - URL: https://e.lanbook.com/book/58703
- Mayer-Schönberger, V., & Cukier, K. (2013). Big Data : A Revolution That Will Transform How We Live, Work, and Think. Boston: Eamon Dolan/Houghton Mifflin Harcourt. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1872664
- Hadley, W. (2016). Ggplot2 : Elegant Graphics for Data Analysis. New York, NY: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1175341