Вычислительная лингвистика

Магистратура 2020/2021

Статус: Курс по выбору (Машинное обучение и анализ данных)

Направление: 01.04.02. Прикладная математика и информатика

Кто читает: Департамент информатики

Где читается: Санкт-Петербургская школа физико-математических и компьютерных наук

Когда читается: 1-й курс, 3 модуль

Формат изучения: без онлайн-курса

Преподаватели: Кольцов Сергей Николаевич, Паничева Полина Вадимовна, Россо Паоло

Прогр. обучения: Машинное обучение и анализ данных

Язык: английский

Кредиты: 4

Контактные часы: 28

Full Syllabus

Abstract

The subject of Applications of Computational Linguistics is composed of two parts. The students will be introduced to a few computational linguistics applications on social media texts: author profiling; opinion mining and irony detection; fake news and hate speech detection; fake reviews and paedofile detection; and text re-use and plagiarism detection. In the lab sessions the students will be introduced to resources and tools for natural language processing in Python such as NLTK and sklearn. Finally, as project the students will be asked to work with the dataset of the 2021 shared task on Profiling hate speech spreaders in Twitter (HATERS).

Learning Objectives

To introduce the students with a few computational linguistics applications on social media texts.
To introduce the students to resources and tools for natural language processing in Python such as NLTK and sklearn.

Expected Learning Outcomes

Student knows how to identify demographic characteristics of bloggers’ texts authors, is able to distinguish between human and bot
Student can conduct sentiment analysis and detect irony
Student is able to detect fake news and hate speech
Student knows and is able to apply main approaches to identify misleading content
Student is able to detect plagiarism in texts

Course Contents

Topic 1. Introduction to computational linguistics. Author profiling in social media.
Topic 2. Opinion mining and irony detection
Topic 3. Fake news and hate speech detection
Topic 4. Social media misuse: fake reviews and paedophile
Topic 5. Text re-use and plagiarism detection

Assessment Elements

Test
Exam
The exam is conducted in a form of competition between groups of students, within which the students must build a set of machine learning algorithms based on a test dataset with sentiment markup. A set of algorithms means combinations of different classifiers, neural models, or machine learning algorithms. Students choose a set of algorithms independently. The constructed sets of algorithms must be tested on a test collection during the competition. The results should be presented in the form of a paper draft and a Power Point presentation.

Interim Assessment

Interim assessment (3 module)
0.7 * Exam + 0.3 * Test

Bibliography

Recommended Core Bibliography

Mitkov R. (ed.). The Oxford handbook of computational linguistics. – Oxford University Press, 2005.

Recommended Additional Bibliography

Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
Pozzi F. et. al. Sentiment Analysis in Social Networks. - Morgan Kaufmann Publishers, 2016. - ЭБС Books 24x7.

Course Syllabus