• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Master 2020/2021

Applications in Computational Linguistics

Type: Elective course (Machine Learning and Data Analysis)
Area of studies: Applied Mathematics and Informatics
Delivered by: Department of Informatics
When: 1 year, 3 module
Mode of studies: offline
Instructors: Россо Паоло, Sergei Koltsov, Polina Panicheva
Master’s programme: Machine Learning and Data Analysis
Language: English
ECTS credits: 4
Contact hours: 28

Course Syllabus

Abstract

The subject of Applications of Computational Linguistics is composed of two parts. The students will be introduced to a few computational linguistics applications on social media texts: author profiling; opinion mining and irony detection; fake news and hate speech detection; fake reviews and paedofile detection; and text re-use and plagiarism detection. In the lab sessions the students will be introduced to resources and tools for natural language processing in Python such as NLTK and sklearn. Finally, as project the students will be asked to work with the dataset of the 2021 shared task on Profiling hate speech spreaders in Twitter (HATERS).
Learning Objectives

Learning Objectives

  • To introduce the students with a few computational linguistics applications on social media texts.
  • To introduce the students to resources and tools for natural language processing in Python such as NLTK and sklearn.
Expected Learning Outcomes

Expected Learning Outcomes

  • Student knows how to identify demographic characteristics of bloggers’ texts authors, is able to distinguish between human and bot
  • Student can conduct sentiment analysis and detect irony
  • Student is able to detect fake news and hate speech
  • Student knows and is able to apply main approaches to identify misleading content
  • Student is able to detect plagiarism in texts
Course Contents

Course Contents

  • Topic 1. Introduction to computational linguistics. Author profiling in social media.
  • Topic 2. Opinion mining and irony detection
  • Topic 3. Fake news and hate speech detection
  • Topic 4. Social media misuse: fake reviews and paedophile
  • Topic 5. Text re-use and plagiarism detection
Assessment Elements

Assessment Elements

  • non-blocking Test
  • blocking Exam
    The exam is conducted in a form of competition between groups of students, within which the students must build a set of machine learning algorithms based on a test dataset with sentiment markup. A set of algorithms means combinations of different classifiers, neural models, or machine learning algorithms. Students choose a set of algorithms independently. The constructed sets of algorithms must be tested on a test collection during the competition. The results should be presented in the form of a paper draft and a Power Point presentation.
Interim Assessment

Interim Assessment

  • Interim assessment (3 module)
    0.7 * Exam + 0.3 * Test
Bibliography

Bibliography

Recommended Core Bibliography

  • Mitkov R. (ed.). The Oxford handbook of computational linguistics. – Oxford University Press, 2005.

Recommended Additional Bibliography

  • Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
  • Pozzi F. et. al. Sentiment Analysis in Social Networks. - Morgan Kaufmann Publishers, 2016. - ЭБС Books 24x7.