• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Postgraduate course 2018/2019

Research Problems in Natural Language Processing

Type: Elective course
Area of studies: Computer and Information Scienc
When: 1 year, 1 semester
Mode of studies: Full time
Instructors: Ekaterina Artemova
Language: English
ECTS credits: 4

Course Syllabus

Abstract

This course comprises recent advances in natural language processing. During late 2010s a paradigm shift in NLP happened due to increasing power of deep learning. We discuss neural approaches to morphological and syntax parsing. Such applications as question answering and machine translations are introduced along with neural networks used for the tasks. Transfer learning techniques, including language model pre-training and domain adaption, are presented.
Learning Objectives

Learning Objectives

  • The learning objective of the course “Research Problems in Natural Language Proceesing” is to provide students advanced techniques and deeper theoretical and practical knowledge in modern NLP tasks, such as: • distributional semantics; • topic modelling; • sequence labelling; • structured learning; • text classification and clustering; • unsupervised information extraction.
Expected Learning Outcomes

Expected Learning Outcomes

  • Knowledge about such models as word embeddings, Latent Dirichlet Allocation, conditional random fields, structured SVM, convolutional neural networks, recurrent neural networks, POS-tagging and syntax parsing
  • Knowledge about ongoing developments in NLP
  • Hands-on experience with large scale NLP problems
  • Knowledge about how to design, develop and evaluate NLP programs using programming language Python
Course Contents

Course Contents

  • Introduction to NLP, basic concepts
    Basic definitions of NLP tasks and methods and basic introduction to linguistics, evaluation metrics and language recourses.
  • Text preprocessing: tokenization, POS-tagging, syntax parsing
    Rule-based and machine learning-bases tokenization and POS-tagging, constituency and dependency grammars, syntax parsing.
  • Topic modelling
    Vector space model and dimensionality reduction. Latent semantic indexing, latent Dirichlet allocation, dynamic topic models, hierarchical Dirichlet process, autoencoders.
  • Distributional semantics
    Embedding models: positive pointwise mutual information matrix decomposition, singular value decomposition, word2vec, GloVe, StarSpace, AdaGram, etc.
  • Sequence labelling
    Named entity recognition, relation and event extraction and POS-tagging as sequence labelling task. Hidden Markov model, Markov maximal entropy model, conditional random fields, reccurent neural networks.
  • Structured learning
    Syntax parsing and semantic role labelling as structured learning task. Structured SVM and structured perceptron.
  • Text classification and clustering
    Baseline methods for text classification: naïve Bayes, logisitic regression, fasttext, convolutional neural networks, hard attention mechanism for recurrent neural networks.
  • Unsupervised Information Extraction
    OpenIE paradigm. SOV triples extraction, classification and clustering. Temporal textual data analysis.
Assessment Elements

Assessment Elements

  • non-blocking Homework
  • non-blocking Presence on all lectures and seminars
  • non-blocking Exam
Interim Assessment

Interim Assessment

  • Interim assessment (1 semester)
    0.5 * Exam + 0.4 * Homework + 0.1 * Presence on all lectures and seminars