Магистратура
2020/2021
Вычислительная лингвистика
Статус:
Курс по выбору (Машинное обучение и анализ данных)
Направление:
01.04.02. Прикладная математика и информатика
Кто читает:
Департамент информатики
Когда читается:
1-й курс, 3 модуль
Формат изучения:
без онлайн-курса
Прогр. обучения:
Машинное обучение и анализ данных
Язык:
английский
Кредиты:
4
Контактные часы:
28
Course Syllabus
Abstract
The subject of Applications of Computational Linguistics is composed of two parts. The students will be introduced to a few computational linguistics applications on social media texts: author profiling; opinion mining and irony detection; fake news and hate speech detection; fake reviews and paedofile detection; and text re-use and plagiarism detection. In the lab sessions the students will be introduced to resources and tools for natural language processing in Python such as NLTK and sklearn. Finally, as project the students will be asked to work with the dataset of the 2021 shared task on Profiling hate speech spreaders in Twitter (HATERS).
Learning Objectives
- To introduce the students with a few computational linguistics applications on social media texts.
- To introduce the students to resources and tools for natural language processing in Python such as NLTK and sklearn.
Expected Learning Outcomes
- Student knows how to identify demographic characteristics of bloggers’ texts authors, is able to distinguish between human and bot
- Student can conduct sentiment analysis and detect irony
- Student is able to detect fake news and hate speech
- Student knows and is able to apply main approaches to identify misleading content
- Student is able to detect plagiarism in texts
Course Contents
- Topic 1. Introduction to computational linguistics. Author profiling in social media.
- Topic 2. Opinion mining and irony detection
- Topic 3. Fake news and hate speech detection
- Topic 4. Social media misuse: fake reviews and paedophile
- Topic 5. Text re-use and plagiarism detection
Assessment Elements
- Test
- ExamThe exam is conducted in a form of competition between groups of students, within which the students must build a set of machine learning algorithms based on a test dataset with sentiment markup. A set of algorithms means combinations of different classifiers, neural models, or machine learning algorithms. Students choose a set of algorithms independently. The constructed sets of algorithms must be tested on a test collection during the competition. The results should be presented in the form of a paper draft and a Power Point presentation.
Bibliography
Recommended Core Bibliography
- Mitkov R. (ed.). The Oxford handbook of computational linguistics. – Oxford University Press, 2005.
Recommended Additional Bibliography
- Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
- Pozzi F. et. al. Sentiment Analysis in Social Networks. - Morgan Kaufmann Publishers, 2016. - ЭБС Books 24x7.