• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Master 2019/2020

Research Seminar ''Intelligent Systems and Structural Analysis''

Type: Elective course (Data Science)
Area of studies: Applied Mathematics and Informatics
When: 2 year, 1, 2 module
Mode of studies: offline
Instructors: Boris Galitsky, Dmitry Ilvovsky, Konstantin Yakovlev
Master’s programme: Data Science
Language: English
ECTS credits: 8
Contact hours: 32

Course Syllabus

Abstract

The discipline goal is to develop students' professional skills required for independent analytical work in applied fields of the computer science. Also, this course aims to improve skills of students in developing their research projects related with dialogue systems and chat bots. This course focuses on analysis of scientific and industrial linguistic system developing and motivates visiting different scientific colloquium at the university, especially at the faculty of computer science.
Learning Objectives

Learning Objectives

  • The Research Seminar should help students to form the basic skills training to make and present their own research, motivate to engage in the scientific activity.
Expected Learning Outcomes

Expected Learning Outcomes

  • Know basic principles of developing task-oriented linguistic dialogue systems.
  • Formulate the task and goals for an independent research and/or scientific programing system development.
  • Prepare a presentation based on his research and/or scientific programing system.
  • Know main principles of social bots.
  • Know main principles of task-oriented bots.
  • Know fundamental approaches to natural language understanding and dialogue management in the task-oriented dialogue systems.
  • Know basic principles of assuring chat bot relevance at syntactic level.
  • Know basic principles of Q/A for Bots.
  • Know basic principles of discourse-level structures.
  • Know basic principles of building taxonomy and thesaurus for chat bots.
  • Know basic principles of chat bot content processing pipeline.
  • Know basic principles of managing rhetorical agreement in dialogue utterances.
  • Know basic principles of discourse-level dialogue management.
  • Know basic principles of argumentation for chat bot.
Course Contents

Course Contents

  • A basic chat bot
    <ul><li>Building transactional chatbots with Api.ai;</li> <li>Building FAQ chatbot with Microsoft QnA Maker;</li> <li>A chatbot with rule-based dialogue management.</li></ul>
  • Social Bots
    <ul><li>Main principles.</li></ul>
  • Task-oriented Bots
    <ul><li>Main principles.</li></ul>
  • NL Understanding
    <ul><li>Introduction to NLP and NLU.</li></ul>
  • Assuring chat bot relevance at syntactic level
    <ul><li>Syntactic Generalization in search and relevance assessment;</li> <li>Generalizing portions of text;</li> <li>Generalizing at various levels: From words to paragraphs;</li> <li>Equivalence transformation on phrases;</li> <li>Simplified example of generalization of sentences;</li> <li>From syntax to inductive semantics;</li> <li>Nearest-neighbor learning of generalizations;</li> <li>Syntactic generalization-based search engine and its evaluation;</li> <li>User interface of search engine;</li> <li>Qualitative evaluation of search;</li> <li>Evaluation of web search relevance improvement;</li> <li>Evaluation of product search;</li> <li>Comparison with other means of search relevance improvement;</li> <li>Evaluation of text classification problems;</li> <li>Comparative performance analysis in text classification domains;</li> <li>Example of recognizing meaningless sentences;</li> <li>Commercial evaluation of text similarity improvement.</li> </ul>
  • Q/A for Bots: Semantic headers and semantic skeletons
  • Learning Discourse-level structures
    <ul><li>Answering paragraph-size questions;</li> <li>From sentence-level to paragraph-level generalization;</li> <li>Rhetoric structures and speech acts as inter-sentence links;</li> <li>Adapting RST for multi-sentence search;</li> <li>Adapting Speech Act Theory for multi-sentence search;</li> <li>Parse thickets and their graph representation;</li> <li>Equivalence transformation of phrases;</li> <li>Finding similarity between two paragraphs of text;</li> <li>How coreferences help search recall;</li> <li>How rhetoric relation improve search accuracy;</li> <li>Thicket Phrases and their generalization;</li> <li>Example of parse thicket;</li> <li>Generalization of parse thickets;</li> <li>Generalization for RST arcs;</li> <li>Generalization for CA arcs;</li> <li>Computing maximal common sub-PTs;</li> <li>Architecture of PT processing system;</li> <li>Evaluation of PT-supported search relevance;</li> <li>Evaluation settings;</li> <li>Pair-wise sentence generalization for question-answer similarity;</li> <li>Single sentence query and answer distributed through multiple sentences;</li> <li>Query is a paragraph and answer is a paragraph;</li> <li>Phrase-based and graph-based implementation of generalization;</li> <li>Comparison of search performance with other studies. </li></ul>
  • Building taxonomy and thesaurus for chat bots
    <ul><li>Improving search relevance by taxonomies;</li> <li>Must-occur keywords;</li> <li>Must-occur keywords in a taxonomy;</li> <li>Constructing relevance score function;</li> <li>Examples of filtering answers based on taxonomy;</li> <li>Taxonomy-based algorithm for filtering search results;</li> <li>Building taxonomies by web mining;</li> <li>Building taxonomy by generalizing search results;</li> <li>Practical considerations;</li> <li>Evaluation of search relevance improvement by taxonomies;</li> <li>Evaluation settings of search relevance improvement;</li> <li>Vertical search;</li> <li>Web search relevance improvement;</li> <li>Taxonomy-supported search engine in news domain;</li> <li>Taxonomies for query expansion;</li> <li>Using search in Similarity component;</li> <li>Running taxonomy learner.</li> </ul>
  • Chat bot content processing pipeline
    <ul><li>From search to personalized recommendations;</li> <li>A content pipeline and its relevance-related problems Content pipeline architecture;</li> <li>Content processing engines;</li> <li>Content processing units;</li> <li>Harvesting unit;</li> <li>Content mining unit Taxonomy unit;</li> <li>Opinion mining unit De-duplication unit Search Engine Marketing unit;</li> <li>Speech recognition semantics unit;</li> <li>Search unit;</li> <li>Personalization unit;</li> <li>Generalization of texts;</li> <li>Simplified example of generalization of sentences;</li> <li>Sample generalization between phrases;</li> <li>Tree Kernel approach for text similarity;</li> <li>Phrase-level generalization;</li> <li>Generalization of expressions of interest;</li> <li>Personalization algorithm as intersection of likes;</li> <li>Mapping categories of interest / taxonomies;</li> <li>Defeasible logic programming-based rule engine;</li> <li>Content pipeline algorithms;</li> <li>Taxonomy construction algorithm;</li> <li>De-duplication algorithms Sentiment analysis algorithm;</li> <li>Search engine marketing ad construction algorithm. </li></ul>
  • Managing Rhetorical Agreement in Dialogue Utterances
    <ul><li>Communicative Discourse Trees;</li> <li>Representing rhetorical relations and communicative actions;</li> <li>Greedy representations for a Q/A pair;</li> <li>Communicative actions and their generalization;</li> <li>Generalization for RST relations;</li> <li>Representing a Request-Response chain;</li> <li>Classification settings for Request-Response pairs;</li> <li>Nearest Neighbor graph-based classification;</li> <li>Thicket Kernel learning for CDT;</li> <li>Implementation of Rhetorical Agreement classifier;</li> <li>Discourse Structure-Driven Dialogue Management;</li> <li>Maintaining cohesive session flow in a chat bot;</li> <li>Personalized Domain Exploration Scenarios;</li> <li>Navigation with the Extended Discourse Tree;</li> <li>Recognizing valid and invalid R-R pairs;</li> <li>CDT Construction Task;</li> <li>Managing dialogues and question answering;</li> <li>Analytical approaches to RR Agreement;</li> <li>Rhetorical relations and argumentation.</li> </ul>
  • Discourse-level Dialogue management
    <ul><li>Finding Answers with Optimal Rhetoric Representation;</li> <li>Adjusting rhetoric representation of answer to that of a question;</li> <li>Maintaining a sequence of discourse trees;</li> <li>Identifying rhetoric correlation;</li> <li>Building Dialogue Structure from Discourse Tree of a Query;</li> <li>Maintaining communicative discourse for Q and A;</li> <li>Learning complement relation. </li></ul>
  • Data for chat bot training
  • Argumentation for chat bot
Assessment Elements

Assessment Elements

  • non-blocking Presentation
    Progress report on the programming project. <br /> Speaking time is no more 15 min.
  • non-blocking Programming project
    Report on the programming project: individual paper report and group presentation.
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    The final mark is evaluated like: <br /> О<sub>final</sub>= 1•О<sub>project</sub> <br />It also includes providing final report on the project and public defense of the project in the form of presentation.
Bibliography

Bibliography

Recommended Core Bibliography

  • Manning C. D., Schutze H. Foundations of statistical natural processing. – 1999. – 719 pp.

Recommended Additional Bibliography

  • Perkins J. Python text processing with NLTK 2.0 cookbook. – Packt Publishing Ltd, 2010. – 336 pp.