• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Аспирантура 2020/2021

Исследовательские задачи в обработке естественного языка

Статус: Курс по выбору
Направление: 02.06.01. Компьютерные и информационные науки
Когда читается: 2-й курс, 1 семестр
Формат изучения: без онлайн-курса
Язык: английский
Кредиты: 5
Контактные часы: 40

Course Syllabus

Abstract

This course comprises recent advances in natural language processing. During late 2010s a paradigm shift in NLP happened due to increasing power of deep learning. We discuss neural approaches to morphological and syntax parsing. Such applications as question answering and machine translations are introduced along with neural networks used for the tasks. Transfer learning techniques, including language model pre-training and domain adaption, are presented.
Learning Objectives

Learning Objectives

  • The learning objective of the course “Research Problems in Natural Language Proceesing” is to provide students advanced techniques and deeper theoretical and practical knowledge in modern NLP tasks, such as: • distributional semantics; • topic modelling; • sequence labelling; • structured learning; • text classification and clustering; • unsupervised information extraction.
Expected Learning Outcomes

Expected Learning Outcomes

  • Knowledge about such models as word embeddings, Latent Dirichlet Allocation, conditional random fields, structured SVM, convolutional neural networks, recurrent neural networks, POS-tagging and syntax parsing
  • Knowledge about ongoing developments in NLP
  • Knowledge about how to design, develop and evaluate NLP programs using programming language Python
  • Hands-on experience with large scale NLP problems
Course Contents

Course Contents

  • Introduction to NLP, basic concepts
    Basic definitions of NLP tasks and methods and basic introduction to linguistics, evaluation metrics and language recourses.
  • Text preprocessing: tokenization, POS-tagging, syntax parsing
    Rule-based and machine learning-bases tokenization and POS-tagging, constituency and dependency grammars, syntax parsing.
  • Topic modelling
    Vector space model and dimensionality reduction. Latent semantic indexing, latent Dirichlet allocation, dynamic topic models, hierarchical Dirichlet process, autoencoders.
  • Distributional semantics
    Embedding models: positive pointwise mutual information matrix decomposition, singular value decomposition, word2vec, GloVe, StarSpace, AdaGram, etc.
  • Sequence labelling
    Named entity recognition, relation and event extraction and POS-tagging as sequence labelling task. Hidden Markov model, Markov maximal entropy model, conditional random fields, reccurent neural networks.
  • Structured learning
    Syntax parsing and semantic role labelling as structured learning task. Structured SVM and structured perceptron.
  • Text classification and clustering
    Baseline methods for text classification: naïve Bayes, logisitic regression, fasttext, convolutional neural networks, hard attention mechanism for recurrent neural networks.
  • Unsupervised Information Extraction
    OpenIE paradigm. SOV triples extraction, classification and clustering. Temporal textual data analysis.
Assessment Elements

Assessment Elements

  • non-blocking Homework
  • non-blocking Presence on all lectures and seminars
  • non-blocking Exam
  • non-blocking Homework
  • non-blocking Presence on all lectures and seminars
  • non-blocking Exam
Interim Assessment

Interim Assessment

  • Interim assessment (1 semester)
    0.5 * Exam + 0.4 * Homework + 0.1 * Presence on all lectures and seminars
Bibliography

Bibliography

Recommended Core Bibliography

  • Manning, C. D., & Schèutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=24399
  • Yang Liu, & Meng Zhang. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, (1), 193. https://doi.org/10.1162/COLI_r_00312

Recommended Additional Bibliography

  • Shay Cohen. (2019). Bayesian Analysis in Natural Language Processing : Second Edition. San Rafael: Morgan & Claypool Publishers. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2102157