• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Master 2023/2024

Introduction to NLP

Type: Compulsory course (Applied Linguistics and Text Analytics)
Area of studies: Fundamental and Applied Linguistics
Delivered by: School of Fundamental and Applied Linguistics
When: 1 year, 4 module
Mode of studies: offline
Open to: students of all HSE University campuses
Instructors: Natalya Stankevich
Master’s programme: Прикладная лингвистика и текстовая аналитика
Language: English
ECTS credits: 3
Contact hours: 40

Course Syllabus

Abstract

NLP (Natural Language Processing) is natural language processing, which allows you to apply machine learning algorithms to text and speech. The course will study the basics of NLP, the mathematical methods used in NLP, sentiment analysis, working with databases etc.
Learning Objectives

Learning Objectives

  • The purpose of the course is to gain knowledges of the Natural Language Processing statistical methods.
  • Get acquainted with vector represintation of data.
  • Study building models in NLP.
  • Get acquainted with NLP Libraries.
Expected Learning Outcomes

Expected Learning Outcomes

  • Processing texts using basic string manipulations, as well as sentiment analysis and topic modeling
  • A student knows the history of the discipline and subfields
  • have the skill to work unstructured text data
  • Students are aware of concept and can write Python program for k-nearest neighbors classification
  • Students are aware of concept and can write Python program for naive Bayes classification
  • Students can use Python to pefrom text preprocessing: word normalization (spelling correction, stemming, lemmatization, stopword removal, case folding), tokenization and creation n-grams .
  • Understand the transformer architecture
  • Students are aware of different types of machine learning techniques, such as supervised and unsupervised learning.
  • Student are aware of ways to collect data by scraping web-pages.
  • Students are aware of topic modelling
  • Student are aware of two different algorithms, LSA, LDA
  • A student apply the basics of thematic modeling, is familiar with the main approaches of text summarization, simplification and text generation, writes the examples of programs in Python
  • Students are aware of the motivations behind converting human language into mathematical structures.
  • Student are aware of the different types of vector representation techniques.
  • Apply the NLP transduction and induction process
  • Apply CoLA, SST-2, Winograd schemas for solving tasks
Course Contents

Course Contents

  • Introduction
  • Basic Feature Extraction Methods
  • Developing a Text classifier.
  • Collecting Text Data from the Web.
  • Topic Modelling.
  • Text Summarization and Text Generation.
  • Vector Representation.
  • Sentiment Analysis.
  • Transformers for Natural Language Processing
Assessment Elements

Assessment Elements

  • non-blocking Practical Work 1 "Basic Feature Extraction Methods"
  • non-blocking Practical Work 2 "Developing a Text classifiers"
  • non-blocking Practical Work 3 "Collecting Text Data from the Web"
  • non-blocking Practical work 4 "Topic Modeling"
  • non-blocking Practical work 5 "Text summarization and Text Generation"
  • non-blocking Practical Work 6 "Vector Representation"
  • non-blocking Practical Work 7 "Sentiment Analysis"
  • non-blocking Practical Work 8 "Model Architecture of the Transformer"
  • non-blocking Practical Work 9 'NLP Task with Transformers'
  • non-blocking Activity on Lections
  • non-blocking Creative Task
Interim Assessment

Interim Assessment

  • 2023/2024 4th module
    0.05 * Activity on Lections + 0.05 * Creative Task + 0.1 * Practical Work 1 "Basic Feature Extraction Methods" + 0.1 * Practical Work 2 "Developing a Text classifiers" + 0.1 * Practical Work 3 "Collecting Text Data from the Web" + 0.1 * Practical Work 6 "Vector Representation" + 0.1 * Practical Work 7 "Sentiment Analysis" + 0.1 * Practical Work 8 "Model Architecture of the Transformer" + 0.1 * Practical Work 9 'NLP Task with Transformers' + 0.1 * Practical work 4 "Topic Modeling" + 0.1 * Practical work 5 "Text summarization and Text Generation"
Bibliography

Bibliography

Recommended Core Bibliography

  • Beysolow, T. (2018). Applied Natural Language Processing with Python : Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1892182
  • Indurkhya N., Damerau F. J. Handbook of natural language processing. – Chapman and Hall/CRC, 2010. – 704 pp.
  • Introduction to natural language processing, Eisenstein, J., 2019
  • Nfn Bahrawi. (2019). Online Realtime Sentiment Analysis Tweets by Utilizing Streaming API Features From Twitter. Jurnal Penelitian Pos Dan Informatika, (1), 53. https://doi.org/10.17933/jppi.2019.090105
  • Pozzi F. et. al. Sentiment Analysis in Social Networks. - Morgan Kaufmann Publishers, 2016. - ЭБС Books 24x7.
  • Speech and language processing. An introduction to natural language processing, computational lin..., Jurafsky, D., 2009
  • Transformers for machine learning : a deep dive, Kamath, U., 2022

Recommended Additional Bibliography

  • Dale R., Moisl H., Somers H. (ed.). Handbook of natural language processing. – CRC Press, 2000. – 1015 pp.
  • Natural Language Processing and Information Systems. (2017). Springer.
  • Осваиваем архитектуру Transformer : разработка современных моделей с помощью передовых методов обработки естественного языка, Йылдырым, С., 2022