• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Master 2021/2022

Unstructured Data Analysis

Category 'Best Course for New Knowledge and Skills'
Area of studies: Applied Mathematics and Informatics
When: 1 year, 1, 2 module
Mode of studies: offline
Open to: students of all HSE University campuses
Instructors: Ilia Karpov
Master’s programme: Applied Statistics with Network Analysis
Language: English
ECTS credits: 4
Contact hours: 40

Course Syllabus

Abstract

This course focuses on applied methods and existing tools for information retrieval: web scrap-ing, data preprocessing, natural language processing. All methods considered in this course require basic knowledge of discrete mathematics and probabilistic theory . For instance, most NLP and IR methods use conditional probability. In this course, we show the implementation of contemporary approaches in existing software packages (preferably in the python frameworks), and demonstrate how these methods can be used for the solution of some real-world problems.
Learning Objectives

Learning Objectives

  • Show the implementation of contemporary approaches in existing software packages (preferably in the python frameworks), and demonstrate how these methods can be used for the solution of some real-world problems.
Expected Learning Outcomes

Expected Learning Outcomes

  • Знать и применять базовые методы обработки и анализа текстов
  • Знать этические аспекты обработки текстов
  • Уметь решать задачи, связанные с моделированием языка
  • Уметь решать специализированные задачи на текстовых данных
Course Contents

Course Contents

  • Введение. Статистический анализ текстов
  • Векторные модели представления слов
  • Классификация текстов
  • Классификация последовательностей
  • Предобученные языковые модели
  • Синтаксический анализ
  • Машинный перевод
  • Генерация текстов
  • Разметка данных, активное обучение.
  • Вопросное-ответные системы
  • Мультимодальные методы
  • Мультиязычные методы
  • Обработка текстов в медицине
  • Информационный поиск
  • Этические вопросы в обработке текстов
Assessment Elements

Assessment Elements

  • non-blocking Cumulative mark for the work during the modulus
  • non-blocking Final exam
Interim Assessment

Interim Assessment

  • 2021/2022 2nd module
    0.6 * Final exam + 0.4 * Cumulative mark for the work during the modulus
Bibliography

Bibliography

Recommended Core Bibliography

  • Manning, C. D., & Schèutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=24399

Recommended Additional Bibliography

  • Shay Cohen. (2019). Bayesian Analysis in Natural Language Processing : Second Edition. San Rafael: Morgan & Claypool Publishers. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2102157