Исследовательские задачи в обработке естественного языка
- The learning objective of the course “Research Problems in Natural Language Proceesing” is to provide students advanced techniques and deeper theoretical and practical knowledge in modern NLP tasks, such as: • distributional semantics; • topic modelling; • sequence labelling; • structured learning; • text classification and clustering; • unsupervised information extraction.
- Knowledge about such models as word embeddings, Latent Dirichlet Allocation, conditional random fields, structured SVM, convolutional neural networks, recurrent neural networks, POS-tagging and syntax parsing
- Knowledge about ongoing developments in NLP
- Hands-on experience with large scale NLP problems
- Knowledge about how to design, develop and evaluate NLP programs using programming language Python
- Introduction to NLP, basic conceptsBasic definitions of NLP tasks and methods and basic introduction to linguistics, evaluation metrics and language recourses.
- Text preprocessing: tokenization, POS-tagging, syntax parsingRule-based and machine learning-bases tokenization and POS-tagging, constituency and dependency grammars, syntax parsing.
- Topic modellingVector space model and dimensionality reduction. Latent semantic indexing, latent Dirichlet allocation, dynamic topic models, hierarchical Dirichlet process, autoencoders.
- Distributional semanticsEmbedding models: positive pointwise mutual information matrix decomposition, singular value decomposition, word2vec, GloVe, StarSpace, AdaGram, etc.
- Sequence labellingNamed entity recognition, relation and event extraction and POS-tagging as sequence labelling task. Hidden Markov model, Markov maximal entropy model, conditional random fields, reccurent neural networks.
- Structured learningSyntax parsing and semantic role labelling as structured learning task. Structured SVM and structured perceptron.
- Text classification and clusteringBaseline methods for text classification: naïve Bayes, logisitic regression, fasttext, convolutional neural networks, hard attention mechanism for recurrent neural networks.
- Unsupervised Information ExtractionOpenIE paradigm. SOV triples extraction, classification and clustering. Temporal textual data analysis.