• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2022/2023

Обработка естественного языка

Статус: Курс по выбору
Направление: 01.04.02. Прикладная математика и информатика
Когда читается: 2-й курс, 2 модуль
Формат изучения: с онлайн-курсом
Онлайн-часы: 82
Охват аудитории: для своего кампуса
Прогр. обучения: Магистр по наукам о данных
Язык: английский
Кредиты: 4
Контактные часы: 8

Course Syllabus

Abstract

Natural language processing (NLP) is an important field of computer science, artificial intelligence and linguistics aimed at developing systems that are able to understand and generate natural language at the human level. Modern NLP systems are predominantly based on machine learning (ML) and deep learning (DL) algorithms, and have demonstrated impressive results in a wide range of NLP tasks such as summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic modeling. We interact with such systems and use products involving NLP on a daily basis which makes it exciting to learn how these systems work. This course covers the main topics in NLP, ranging from text preprocessing techniques to state-of-the-art neural architectures. We hope to facilitate interest in the field by combining the theoretical basis with the practical applications of the material.
Learning Objectives

Learning Objectives

  • The learning objective is to acquire knowledge on the classical and advanced approaches to NLP including the use of linguistic tools and the development of NLP systems.
Expected Learning Outcomes

Expected Learning Outcomes

  • Known basic NLP tasks
  • Be able to prepare text
  • Be able to solve simple classification task
  • Learn inner workings of the count-based vector representation models, including their advantages and disadvantages
  • Known how to compute the probability distribution with the help of the softmax function that is commonly used in neural-based embedding models
  • Learn technical details about the word2vec and fastText models
  • Learn the details about the extrinsic evaluation of the word embedding models and be able to distinguish it from the intrinsic methods
  • Learn similarity measures that are most commonly used with respect to vector representations
  • Learn the concept of language modeling and solidify the knowledge about tasks that can be solved with the help of language models
  • Learn inner workings of the count-based language language models and smoothing methods
  • Learn how to calculate the number of model parameters in a simple manner
  • Learn how to compute the probability of the sequence given a set of the model hypotheses
  • Solidify the knowledge about the greedy search decoding method and the details about the special tokens
  • Solidify the knowledge about the named entity recognition task, specifically the most commonly used IOB-tagging scheme and the task evaluation metrics
  • Understand attention mechanism
  • Be able to compute attention score and differ attention functions
  • Be able to apply and estimate different decoder techniques
  • Be able to conclude which limitations have an encoder-decoder architecture
  • Understand the BERT architecture and usage
  • Understand the architecture of ELMO models
  • Understand the GPTs architecture
  • Compare different pre-trained models and know how they differ from each other
  • Be able to evaluate pre-trained models and know different techniques for pre-trained models compression
  • Be able to solve simple question-answering tasks
  • Know the particularities of different QA tasks
Course Contents

Course Contents

  • Text preprocessing and text classification
  • Embeddings
  • Language Modelling and Sequence Tagging
  • Machine Translation and Transformers
  • Sesame Street. Transfer Learning
  • Question Answering and Chat-bots
Assessment Elements

Assessment Elements

  • non-blocking Quizzes
    Weekly quizzes
  • non-blocking Programming Assignments
  • non-blocking Final Project
Interim Assessment

Interim Assessment

  • 2022/2023 2nd module
    0.5 * Programming Assignments + 0.2 * Final Project + 0.3 * Quizzes
Bibliography

Bibliography

Recommended Core Bibliography

  • Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
  • Yang Liu, & Meng Zhang. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, (1), 193. https://doi.org/10.1162/COLI_r_00312

Recommended Additional Bibliography

  • Manning, C. D., & Schèutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=24399