Обработка естественного языка

Магистратура 2022/2023

Статус: Курс по выбору

Направление: 01.04.02. Прикладная математика и информатика

Кто читает: Департамент больших данных и информационного поиска

Где читается: Факультет компьютерных наук

Когда читается: 2-й курс, 2 модуль

Формат изучения: с онлайн-курсом

Онлайн-часы: 82

Охват аудитории: для своего кампуса

Преподаватели: Тихонова Мария Ивановна

Прогр. обучения: Магистр по наукам о данных

Язык: английский

Кредиты: 4

Контактные часы: 8

Full Syllabus Ask Question

Abstract

Natural language processing (NLP) is an important field of computer science, artificial intelligence and linguistics aimed at developing systems that are able to understand and generate natural language at the human level. Modern NLP systems are predominantly based on machine learning (ML) and deep learning (DL) algorithms, and have demonstrated impressive results in a wide range of NLP tasks such as summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic modeling. We interact with such systems and use products involving NLP on a daily basis which makes it exciting to learn how these systems work. This course covers the main topics in NLP, ranging from text preprocessing techniques to state-of-the-art neural architectures. We hope to facilitate interest in the field by combining the theoretical basis with the practical applications of the material.

Learning Objectives

The learning objective is to acquire knowledge on the classical and advanced approaches to NLP including the use of linguistic tools and the development of NLP systems.

Expected Learning Outcomes

Known basic NLP tasks
Be able to prepare text
Be able to solve simple classification task
Learn inner workings of the count-based vector representation models, including their advantages and disadvantages
Known how to compute the probability distribution with the help of the softmax function that is commonly used in neural-based embedding models
Learn technical details about the word2vec and fastText models
Learn the details about the extrinsic evaluation of the word embedding models and be able to distinguish it from the intrinsic methods
Learn similarity measures that are most commonly used with respect to vector representations
Learn the concept of language modeling and solidify the knowledge about tasks that can be solved with the help of language models
Learn inner workings of the count-based language language models and smoothing methods
Learn how to calculate the number of model parameters in a simple manner
Learn how to compute the probability of the sequence given a set of the model hypotheses
Solidify the knowledge about the greedy search decoding method and the details about the special tokens
Solidify the knowledge about the named entity recognition task, specifically the most commonly used IOB-tagging scheme and the task evaluation metrics
Understand attention mechanism
Be able to compute attention score and differ attention functions
Be able to apply and estimate different decoder techniques
Be able to conclude which limitations have an encoder-decoder architecture
Understand the BERT architecture and usage
Understand the architecture of ELMO models
Understand the GPTs architecture
Compare different pre-trained models and know how they differ from each other
Be able to evaluate pre-trained models and know different techniques for pre-trained models compression
Be able to solve simple question-answering tasks
Know the particularities of different QA tasks

Course Contents

Text preprocessing and text classification
Embeddings
Language Modelling and Sequence Tagging
Machine Translation and Transformers
Sesame Street. Transfer Learning
Question Answering and Chat-bots

Assessment Elements

Quizzes
Weekly quizzes
Programming Assignments
Final Project

Interim Assessment

2022/2023 2nd module
0.5 * Programming Assignments + 0.2 * Final Project + 0.3 * Quizzes

Bibliography

Recommended Core Bibliography

Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
Yang Liu, & Meng Zhang. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, (1), 193. https://doi.org/10.1162/COLI_r_00312

Recommended Additional Bibliography

Manning, C. D., & Schèutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=24399

Authors

TIKHONOVA MARIYA IVANOVNA
Литвишкина Ален Витальевна

Course Syllabus