Магистратура
2019/2020
Введение в методы сбора и анализа больших данных
Лучший по критерию «Полезность курса для Вашей будущей карьеры»
Статус:
Курс по выбору (Сравнительные социальные исследования)
Направление:
39.04.01. Социология
Кто читает:
Департамент социологии
Где читается:
Факультет социальных наук
Когда читается:
1-й курс, 2 модуль
Формат изучения:
без онлайн-курса
Преподаватели:
Климова Айгуль Маратовна
Прогр. обучения:
Сравнительные социальные исследования
Язык:
английский
Кредиты:
5
Контактные часы:
36
Course Syllabus
Abstract
The growth of Internet penetration and the possibility of collecting and analyzing big data have produced new challenges and have offered new opportunities for researchers and official statistics. Within several years nonreactive and big data has become the main trend in the social sciences. Nonreactive methods include nonparticipant observation and analysis of digital fingerprints such as likes or shares, as well as private documents such as blogs, social media profiles and comments, or public online documents such as mass media materials. People post information, tweet, retweet or share information other people post. Social scientists can apply their experience of designing social research, as well as experimental and quasi-experimental studies to use big data for drawing valid inferences. This course will give an introduction to key quantitative approaches to the collection of non-reactive data in social sciences. The course is taught in the form of lectures, seminars, and individual work. The goal of the course is to introduce the opportunities of nonreactive and big data for social scientists and learn basic methods and tools to collect nonreactive data. Within the course some R packages will be used for data analysis (it is freely available at https://www.r-project.org).
Learning Objectives
- to learn basic concepts of nonreactive data in social sciences
- to be able to collect nonreactive data in social sciences
- to learn opportunities and limitations of applying big data in social sciences
- to be able to apply big data in social sciences
Expected Learning Outcomes
- to learn basic concepts of nonreactive data in social sciences
- to learn opportunities and limitations of applying big data in social sciences
- to be able to apply big data in social sciences
- to learn text mining and network analysis in R
- to be able to collect nonreactive data in social sciences
- to be able to collect nonreactive data in social media
Course Contents
- Introduction to the course. Reactive and nonreactive methodsReactive and nonreactive methods. Nonreactive online methods. Nonparticipant observation and analysis of “digital footprints”. Big data. The typology of nonreactive data. Social media, clickstream data, tracking data. The opportunities and limitations of big data in social sciences. Ethical concerns.
- Introduction to text mining and network analysis in RR Markdown. Regular expressions and essential string functions. Basic data visualization. Introduction to text mining. Introduction to network analysis: basic definitions, centrality measures, different approaches.
- Introduction to webscraping in R.Introduction to webscraping in R. Collecting online data. Collecting unstructured and structured data via R. Scraping web data from APIs.
- Collecting data in Vkontakte, Twitter, FacebookCollecting data in Vkontakte, Twitter, Facebook. Opportunities and limitations.
Interim Assessment
- Interim assessment (2 module)0.1 * class attendance + 0.2 * class participation + 0.7 * Essay
Bibliography
Recommended Core Bibliography
- Torgo, L. (2017). Data Mining with R : Learning with Case Studies, Second Edition (Vol. Second edition). Boca Raton: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1429469
Recommended Additional Bibliography
- Gillespie, C., & Lovelace, R. (2016). Efficient R Programming : A Practical Guide to Smarter Programming. Sebastopol, CA: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1435808