• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2020/2021

Вычислительная лингвистика

Статус: Курс по выбору (Машинное обучение и анализ данных)
Направление: 01.04.02. Прикладная математика и информатика
Когда читается: 1-й курс, 3 модуль
Формат изучения: без онлайн-курса
Преподаватели: Кольцов Сергей Николаевич, Паничева Полина Вадимовна, Россо Паоло
Прогр. обучения: Машинное обучение и анализ данных
Язык: английский
Кредиты: 4
Контактные часы: 28

Course Syllabus

Abstract

The subject of Applications of Computational Linguistics is composed of two parts. The students will be introduced to a few computational linguistics applications on social media texts: author profiling; opinion mining and irony detection; fake news and hate speech detection; fake reviews and paedofile detection; and text re-use and plagiarism detection. In the lab sessions the students will be introduced to resources and tools for natural language processing in Python such as NLTK and sklearn. Finally, as project the students will be asked to work with the dataset of the 2021 shared task on Profiling hate speech spreaders in Twitter (HATERS).
Learning Objectives

Learning Objectives

  • To introduce the students with a few computational linguistics applications on social media texts.
  • To introduce the students to resources and tools for natural language processing in Python such as NLTK and sklearn.
Expected Learning Outcomes

Expected Learning Outcomes

  • Student knows how to identify demographic characteristics of bloggers’ texts authors, is able to distinguish between human and bot
  • Student can conduct sentiment analysis and detect irony
  • Student is able to detect fake news and hate speech
  • Student knows and is able to apply main approaches to identify misleading content
  • Student is able to detect plagiarism in texts
Course Contents

Course Contents

  • Topic 1. Introduction to computational linguistics. Author profiling in social media.
  • Topic 2. Opinion mining and irony detection
  • Topic 3. Fake news and hate speech detection
  • Topic 4. Social media misuse: fake reviews and paedophile
  • Topic 5. Text re-use and plagiarism detection
Assessment Elements

Assessment Elements

  • non-blocking Test
  • blocking Exam
    The exam is conducted in a form of competition between groups of students, within which the students must build a set of machine learning algorithms based on a test dataset with sentiment markup. A set of algorithms means combinations of different classifiers, neural models, or machine learning algorithms. Students choose a set of algorithms independently. The constructed sets of algorithms must be tested on a test collection during the competition. The results should be presented in the form of a paper draft and a Power Point presentation.
Interim Assessment

Interim Assessment

  • Interim assessment (3 module)
    0.7 * Exam + 0.3 * Test
Bibliography

Bibliography

Recommended Core Bibliography

  • Mitkov R. (ed.). The Oxford handbook of computational linguistics. – Oxford University Press, 2005.

Recommended Additional Bibliography

  • Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
  • Pozzi F. et. al. Sentiment Analysis in Social Networks. - Morgan Kaufmann Publishers, 2016. - ЭБС Books 24x7.