• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2022/2023

Обработка естественного языка

Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»
Лучший по критерию «Новизна полученных знаний»
Статус: Курс по выбору (Программная инженерия)
Направление: 09.03.04. Программная инженерия
Когда читается: 4-й курс, 1, 2 модуль
Формат изучения: без онлайн-курса
Охват аудитории: для всех кампусов НИУ ВШЭ
Язык: английский
Кредиты: 8
Контактные часы: 35

Course Syllabus

Abstract

The course is aimed at mastering the basics of natural language processing (NLP), a dynamic interdisciplinary field. The course covers the methods and approaches used in many real NLP applications such as language modeling, text classification, sentiment analysis, generalization, and machine translation. Students taking the course will not only use some of the existing NLP libraries and software packages, but will also learn about the principles behind their design and about the mathematical models that underlie modern computational linguistics. The course also involves performing practical tasks in Python programming and experimenting with texts written in English and Russian. Prerequisites are programming skills in python, general knowledge of linguistics
Learning Objectives

Learning Objectives

  • Formation of students' theoretical knowledge and practical skills on the basics of machine processing of natural language.
Expected Learning Outcomes

Expected Learning Outcomes

  • Apply basic approaches to word embeddings, such as Count-based methods, Word2Vec, Glove
  • Apply classic machine learning methods such as Naive Bayes, SVM, LR and deep learning approaches such as FCN, CNN, LSTM for text classification problem
  • Applying open-source libraries for text preprocessing, such as Natasha and nltk. Resume the following common problems: Expand Contractions, Lower Case, Remove Punctuations, Remove words and digits containing digits, Remove Stopwords, Rephrase Text, Stemming and Lemmatization, Remove White spaces
  • Apply various text-generation techniques such as N-grams LMs and Neural LMs
  • Applying the mechanisms of attenuations and transformers to seq2seq problems
  • Apply special data preprocessing techniques and architectures like Bert to the NER problem
  • Apply a SDA and Semi-SDA for Domain Adaptation problem
  • Apply modern architecture Bert
  • Apply of the Burt architecture and its modifications to the problem QA
  • Apply NDA, NMF and LSA to Topic modeling problem
  • Apply various heuristic approaches to improve the quality of text generation
  • Apply modern neural network approaches to solve the problems of summarizing news and reviews
Course Contents

Course Contents

  • Word embedding
  • Text classification
  • Text preprocessing methods
  • Language Modeling
  • Seq2seq models
  • Named Entity Recognition
  • Domain Adaptation
  • Transfer learning
  • Question Answering
  • Topic Modeling
  • Text generation
  • Text summarization
  • Style transfer
Assessment Elements

Assessment Elements

  • non-blocking Text classification problem
  • non-blocking Named entity recognition
  • non-blocking Question Answering
  • non-blocking Text summarization
Interim Assessment

Interim Assessment

  • 2022/2023 2nd module
    0.25 * Text summarization + 0.25 * Named entity recognition + 0.25 * Question Answering + 0.25 * Text classification problem
Bibliography

Bibliography

Recommended Core Bibliography

  • Introduction to natural language processing, Eisenstein, J., 2019
  • Yu, C., Wang, J., Chen, Y., & Huang, M. (2019). Transfer Learning with Dynamic Adversarial Adaptation Network. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1909.08184

Recommended Additional Bibliography

  • Aman Kedia, & Mayank Rasu. (2020). Hands-On Python Natural Language Processing : Explore Tools and Techniques to Analyze and Process Text with a View to Building Real-world NLP Applications. Packt Publishing.