Магистратура
2021/2022
Введение в методы сбора и анализа больших данных
Статус:
Курс обязательный (Комплексный социальный анализ)
Направление:
39.04.01. Социология
Кто читает:
Департамент социологии
Где читается:
Факультет социальных наук
Когда читается:
2-й курс, 2 модуль
Формат изучения:
с онлайн-курсом
Охват аудитории:
для всех кампусов НИУ ВШЭ
Преподаватели:
Арсланова Алина Раильевна
Прогр. обучения:
Комплексный социальный анализ
Язык:
английский
Кредиты:
3
Контактные часы:
4
Course Syllabus
Abstract
This is an introductory course on gathering and analysis of Internet data. This course is oriented on two broad topics: data scraping and analysis of textual data. The course is taught in the form of trainings and practical work. All teaching is conducted in English. Within the course some R packages will be used for data analysis (it is freely available at https://www.r-project.org) This discipline is based on the following subjects: - Probability theory and Mathematical Statistics; - Methodology and Methods for Sociological Research. This discipline requires following knowledge and skills: - to know basic components of the sociological research; - to know various sampling techniques, their opportunities and limitations. Main ideas of the discipline might be applicable in following course: - Theory and Practice of Online Research. These online courses might be helpful in learning of the discipline: Shah C. Social Media Data Analytics. URL: https://www.coursera.org/learn/social-media-data-analytics (retrieved: 20.06.2018) Leek J., Peng R. D., Caffo B. Getting and Cleaning Data. URL: https://www.coursera.org/learn/data-cleaning (retrieved: 20.06.2018) Potapenko A., Zobnin A., Kozlova A., Yudin S., Zimovnov A. Natural Language Processing. URL: https://www.coursera.org/learn/language-processing (retrieved: 20.06.2018)
Learning Objectives
- Study of basic notions of Big data research
- Use of basic techniques to gather Big data and analyze it
Expected Learning Outcomes
- Have skills to analyze textual data
- Have skills to scrap online data through various API, automatization of actions in browser etc
- Have skills to write R code for basic data analysis tasks
- Know basic concepts of Big data, its opportunities, limitations, and relevance to social sciences
- Know basic concepts of R programming language
Course Contents
- Analysis of textual data in R
- Introduction to R
- Introduction to Big data
- Data scraping in R
Assessment Elements
- Class Attendance
- Class Participation
- Home assignment 1Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- Home assignment 2Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- EssayIn the essay a group of students (up to 4) should scrap and analyze online data from various sources on a chosen topic (for instance, news coverage of an event), and report it in a coherent text with introduction (research question, short literature review, and main hypotheses), main body (analysis), conclusion, list of references, and R script in appendix. The length of an essay should be at least 8000 characters without appendix
- Class Attendance
- Class Participation
- Home assignment 1Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- Home assignment 2Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- EssayIn the essay a group of students (up to 4) should scrap and analyze online data from various sources on a chosen topic (for instance, news coverage of an event), and report it in a coherent text with introduction (research question, short literature review, and main hypotheses), main body (analysis), conclusion, list of references, and R script in appendix. The length of an essay should be at least 8000 characters without appendix
- Class Attendance
- Class Participation
- Home assignment 1Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- Home assignment 2Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- EssayIn the essay a group of students (up to 4) should scrap and analyze online data from various sources on a chosen topic (for instance, news coverage of an event), and report it in a coherent text with introduction (research question, short literature review, and main hypotheses), main body (analysis), conclusion, list of references, and R script in appendix. The length of an essay should be at least 8000 characters without appendix
- Class Attendance
- Class Participation
- Home assignment 1Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- Home assignment 2Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
- EssayIn the essay a group of students (up to 4) should scrap and analyze online data from various sources on a chosen topic (for instance, news coverage of an event), and report it in a coherent text with introduction (research question, short literature review, and main hypotheses), main body (analysis), conclusion, list of references, and R script in appendix. The length of an essay should be at least 8000 characters without appendix
Interim Assessment
- 2020/2021 2nd module0.12 * Class Attendance + 0.13 * Class Participation + 0.15 * Home assignment 1 + 0.45 * Essay + 0.15 * Home assignment 2
- 2021/2022 2nd module0.15 * Home assignment 2 + 0.12 * Class Attendance + 0.45 * Essay + 0.15 * Home assignment 1 + 0.13 * Class Participation
Bibliography
Recommended Core Bibliography
- Mayer-Schönberger, V., & Cukier, K. (2013). Big Data : A Revolution That Will Transform How We Live, Work, and Think. Boston: Eamon Dolan/Houghton Mifflin Harcourt. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1872664
- Роберт, И. R в действии. Анализ и визуализация данных в программе R : руководство / И. Роберт, Кабаков ; перевод с английского Полины А. Волковой. — Москва : ДМК Пресс, 2014. — 588 с. — ISBN 978-5-97060-077-1. — Текст : электронный // Лань : электронно-библиотечная система. — URL: https://e.lanbook.com/book/58703 (дата обращения: 00.00.0000). — Режим доступа: для авториз. пользователей.
Recommended Additional Bibliography
- Hadley, W. (2016). Ggplot2 : Elegant Graphics for Data Analysis. New York, NY: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1175341