Компьютерные методы анализа текста

Бакалавриат 2019/2020

Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»

Лучший по критерию «Новизна полученных знаний»

Статус: Курс по выбору (Социология и социальная информатика)

Направление: 39.03.01. Социология

Кто читает: Департамент социологии

Где читается: Санкт-Петербургская школа социальных наук

Когда читается: 3-й курс, 1, 2 модуль

Формат изучения: без онлайн-курса

Преподаватели: Маслинский Кирилл Александрович

Язык: английский

Кредиты: 6

Контактные часы: 48

Full Syllabus

Abstract

For social science research, written text provide essential data for studying ideology and political discourse, conflict, sentiment and political affiliation, among many other things. With a growing availability of larger collections of text in digital form it is tempting to scale the research up in terms of the population studied (e.g. “all social media users of a town”), time spans (e.g. “all of the Post-Soviet history”), and geographical scope (e.g. “all educational migration in Russia”). Computational methods for text analysis promise to aid at the scale where traditional content analysis is not feasible. During the course we will cover basic word statistics, various exploratory methods, supervised and unsupervised modeling of text phenomena.

Learning Objectives

provide basic understanding on how to properly use collections of texts as quantitative evidence, and to make this knowledge practical

Expected Learning Outcomes

Understanding possibilities of the automated text analysis as well as its pitfalls and important caveats about applying statistical tests to language data.
Understanding multidimenional representation of lexical meaning and the role of the dimensionality reduction.
Being able to apply computational methods of text analysis (e.g. analysis of word frequency and co-occurrence, document classification, topic modeling) to collections of texts
Being able to apply word embedding and clustering methods to downstream tasks, such as sentiment analysis, ideological scaling etc.
Being able to adequately interpret and report the results of computational text analysis in research papers.

Course Contents

Style — Document classification
Content — Topic modeling
Sentiment — Sentiment analysis
Structure — Entities extraction

Assessment Elements

Сourse participation
0.3* paper summaries/presentations + 0.2 * in-class participation + 0.2 * homework + 0.3 mid-term test
Final project

Interim Assessment

Interim assessment (2 module)
0.3 * Final project + 0.7 * Сourse participation

Bibliography

Recommended Core Bibliography

Bamman, D., Eisenstein, J., & Schnoebelen, T. (2014). Gender identity and lexical variation in social media[The resear]. Journal of Sociolinguistics, 18(2), 135–160. https://doi.org/10.1111/josl.12080

Recommended Additional Bibliography

Jurafsky, D., Chahuneau, V., Routledge, B. R., & Smith, N. A. (2014). Narrative framing of consumer sentiment in online restaurant reviews. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.18543C32

Course Syllabus