• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2021/2022

Введение в методы сбора и анализа больших данных

Статус: Курс обязательный (Комплексный социальный анализ)
Направление: 39.04.01. Социология
Когда читается: 2-й курс, 2 модуль
Формат изучения: с онлайн-курсом
Охват аудитории: для всех кампусов НИУ ВШЭ
Преподаватели: Арсланова Алина Раильевна
Прогр. обучения: Комплексный социальный анализ
Язык: английский
Кредиты: 3
Контактные часы: 4

Course Syllabus

Abstract

This is an introductory course on gathering and analysis of Internet data. This course is oriented on two broad topics: data scraping and analysis of textual data. The course is taught in the form of trainings and practical work. All teaching is conducted in English. Within the course some R packages will be used for data analysis (it is freely available at https://www.r-project.org) This discipline is based on the following subjects: - Probability theory and Mathematical Statistics; - Methodology and Methods for Sociological Research. This discipline requires following knowledge and skills: - to know basic components of the sociological research; - to know various sampling techniques, their opportunities and limitations. Main ideas of the discipline might be applicable in following course: - Theory and Practice of Online Research. These online courses might be helpful in learning of the discipline: Shah C. Social Media Data Analytics. URL: https://www.coursera.org/learn/social-media-data-analytics (retrieved: 20.06.2018) Leek J., Peng R. D., Caffo B. Getting and Cleaning Data. URL: https://www.coursera.org/learn/data-cleaning (retrieved: 20.06.2018) Potapenko A., Zobnin A., Kozlova A., Yudin S., Zimovnov A. Natural Language Processing. URL: https://www.coursera.org/learn/language-processing (retrieved: 20.06.2018)
Learning Objectives

Learning Objectives

  • Study of basic notions of Big data research
  • Use of basic techniques to gather Big data and analyze it
Expected Learning Outcomes

Expected Learning Outcomes

  • Have skills to analyze textual data
  • Have skills to scrap online data through various API, automatization of actions in browser etc
  • Have skills to write R code for basic data analysis tasks
  • Know basic concepts of Big data, its opportunities, limitations, and relevance to social sciences
  • Know basic concepts of R programming language
Course Contents

Course Contents

  • Analysis of textual data in R
  • Introduction to R
  • Introduction to Big data
  • Data scraping in R
Assessment Elements

Assessment Elements

  • non-blocking Class Attendance
  • non-blocking Class Participation
  • non-blocking Home assignment 1
    Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
  • non-blocking Home assignment 2
    Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
  • non-blocking Essay
    In the essay a group of students (up to 4) should scrap and analyze online data from various sources on a chosen topic (for instance, news coverage of an event), and report it in a coherent text with introduction (research question, short literature review, and main hypotheses), main body (analysis), conclusion, list of references, and R script in appendix. The length of an essay should be at least 8000 characters without appendix
  • non-blocking Class Attendance
  • non-blocking Class Participation
  • non-blocking Home assignment 1
    Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
  • non-blocking Home assignment 2
    Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
  • non-blocking Essay
    In the essay a group of students (up to 4) should scrap and analyze online data from various sources on a chosen topic (for instance, news coverage of an event), and report it in a coherent text with introduction (research question, short literature review, and main hypotheses), main body (analysis), conclusion, list of references, and R script in appendix. The length of an essay should be at least 8000 characters without appendix
  • non-blocking Class Attendance
  • non-blocking Class Participation
  • non-blocking Home assignment 1
    Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
  • non-blocking Home assignment 2
    Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
  • non-blocking Essay
    In the essay a group of students (up to 4) should scrap and analyze online data from various sources on a chosen topic (for instance, news coverage of an event), and report it in a coherent text with introduction (research question, short literature review, and main hypotheses), main body (analysis), conclusion, list of references, and R script in appendix. The length of an essay should be at least 8000 characters without appendix
  • non-blocking Class Attendance
  • non-blocking Class Participation
  • non-blocking Home assignment 1
    Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
  • non-blocking Home assignment 2
    Each student must complete this home assignments individually. Students must hand over pdf file with answers and R script. Assessment is graded from 1 (fail) to 10 (excellent).
  • non-blocking Essay
    In the essay a group of students (up to 4) should scrap and analyze online data from various sources on a chosen topic (for instance, news coverage of an event), and report it in a coherent text with introduction (research question, short literature review, and main hypotheses), main body (analysis), conclusion, list of references, and R script in appendix. The length of an essay should be at least 8000 characters without appendix
Interim Assessment

Interim Assessment

  • 2020/2021 2nd module
    0.12 * Class Attendance + 0.13 * Class Participation + 0.15 * Home assignment 1 + 0.45 * Essay + 0.15 * Home assignment 2
  • 2021/2022 2nd module
    0.15 * Home assignment 2 + 0.12 * Class Attendance + 0.45 * Essay + 0.15 * Home assignment 1 + 0.13 * Class Participation
Bibliography

Bibliography

Recommended Core Bibliography

  • Mayer-Schönberger, V., & Cukier, K. (2013). Big Data : A Revolution That Will Transform How We Live, Work, and Think. Boston: Eamon Dolan/Houghton Mifflin Harcourt. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1872664
  • Роберт, И. R в действии. Анализ и визуализация данных в программе R : руководство / И. Роберт, Кабаков ; перевод с английского Полины А. Волковой. — Москва : ДМК Пресс, 2014. — 588 с. — ISBN 978-5-97060-077-1. — Текст : электронный // Лань : электронно-библиотечная система. — URL: https://e.lanbook.com/book/58703 (дата обращения: 00.00.0000). — Режим доступа: для авториз. пользователей.

Recommended Additional Bibliography

  • Hadley, W. (2016). Ggplot2 : Elegant Graphics for Data Analysis. New York, NY: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1175341