Исследовательские задачи в обработке естественного языка

Аспирантура 2020/2021

Статус: Курс по выбору

Направление: 09.06.01. Информатика и вычислительная техника

Кто читает: Департамент больших данных и информационного поиска

Где читается: Факультет компьютерных наук

Когда читается: 2-й курс, 1 семестр

Формат изучения: без онлайн-курса

Преподаватели: Браславский Павел Исаакович

Язык: английский

Кредиты: 5

Контактные часы: 40

Full Syllabus

Abstract

This course comprises recent advances in natural language processing. During late 2010s a paradigm shift in NLP happened due to increasing power of deep learning. We discuss neural approaches to morphological and syntax parsing. Such applications as question answering and machine translations are introduced along with neural networks used for the tasks. Transfer learning techniques, including language model pre-training and domain adaption, are presented.

Learning Objectives

The learning objective of the course “Research Problems in Natural Language Proceesing” is to provide students advanced techniques and deeper theoretical and practical knowledge in modern NLP tasks, such as: • distributional semantics; • topic modelling; • sequence labelling; • structured learning; • text classification and clustering; • unsupervised information extraction.

Expected Learning Outcomes

Knowledge about such models as word embeddings, Latent Dirichlet Allocation, conditional random fields, structured SVM, convolutional neural networks, recurrent neural networks, POS-tagging and syntax parsing
Knowledge about ongoing developments in NLP
Knowledge about how to design, develop and evaluate NLP programs using programming language Python
Hands-on experience with large scale NLP problems

Course Contents

Introduction to NLP, basic concepts
Basic definitions of NLP tasks and methods and basic introduction to linguistics, evaluation metrics and language recourses.
Text preprocessing: tokenization, POS-tagging, syntax parsing
Rule-based and machine learning-bases tokenization and POS-tagging, constituency and dependency grammars, syntax parsing.
Topic modelling
Vector space model and dimensionality reduction. Latent semantic indexing, latent Dirichlet allocation, dynamic topic models, hierarchical Dirichlet process, autoencoders.
Distributional semantics
Embedding models: positive pointwise mutual information matrix decomposition, singular value decomposition, word2vec, GloVe, StarSpace, AdaGram, etc.
Sequence labelling
Named entity recognition, relation and event extraction and POS-tagging as sequence labelling task. Hidden Markov model, Markov maximal entropy model, conditional random fields, reccurent neural networks.
Structured learning
Syntax parsing and semantic role labelling as structured learning task. Structured SVM and structured perceptron.
Text classification and clustering
Baseline methods for text classification: naïve Bayes, logisitic regression, fasttext, convolutional neural networks, hard attention mechanism for recurrent neural networks.
Unsupervised Information Extraction
OpenIE paradigm. SOV triples extraction, classification and clustering. Temporal textual data analysis.

Assessment Elements

Homework
Presence on all lectures and seminars
Exam
Homework
Presence on all lectures and seminars
Exam

Interim Assessment

Interim assessment (1 semester)
0.5 * Exam + 0.4 * Homework + 0.1 * Presence on all lectures and seminars

Bibliography

Recommended Core Bibliography

Manning, C. D., & Schèutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=24399
Yang Liu, & Meng Zhang. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, (1), 193. https://doi.org/10.1162/COLI_r_00312

Recommended Additional Bibliography

Shay Cohen. (2019). Bayesian Analysis in Natural Language Processing : Second Edition. San Rafael: Morgan & Claypool Publishers. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2102157

Course Syllabus