• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
For visually-impairedUser profile (HSE staff only)SearchMenu

Well-interpretable Methods of Knowledge Discovery and Knowledge Representation

Priority areas of development: mathematics

Goal of research

The research aims to develop new mathematical models, algorithms and software tools for solving problems of data mining and knowledge discovery for data with complex structure including text mining, graph mining, machine learning algorithms in classification problems of complex objects, etc. The developed methods, algorithms, and software tools will be applied in solution of practical tasks.

Thus, the domain of the research are methods, algorithms and software of data mining, visualization, ontology modelling, and automated text processing solutions. The research is focused on the features of methods and algorithms, like scope of application, precision and performance, with a special stress on interpretability.


The research is based on methods of discrete mathematics, computer science, computational linguistics, software engineering. The main cycle for new scientific results achievement is

  • Generating hypothesis about patterns and dependencies in an application domain;
  • Designing mathematical models relevant to this pattern creation;
  • Developing algorithms and software which implement the models;
  • Experiments with software pilot on applied problems.

First, we consider fundamental mathematical models based on Formal Concept Analysis (FCA), clustering, machine learning, applied graph theory. Second, we use methods of automatic text processing and ontology modelling. Then we implement original methods and algorithms in various components of intelligent software systems. Such implementations can be tested in synthetic tasks and adopted in practical applications.

Empirical base of research

For testing purpose, we use synthetic data, gathered from electronic scientific libraries, social media services, collections of healthcare records, datasets from PhysioBank of PhysioNet project (http://physionet.ecuore.org/physiobank/), the open data repositories like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), etc.

Results of research

There are 29 scientific papers with results of the research were published during December 2017 – November 2018. The main results are:

  1. The deep review of educational data analysis with an emphasis on methodological foundation of new generation of adaptive learning software.
  2. New mathematical models of knowledge representation in ontology-based adaptive learning systems were developed.
  3. The adaptive learning and the assessment system with automatic item generation were developed.
  4. Analysis of cases, modern tools, challenges, and opportunities in the field of robojournalism with a focus on automatic news’ content and comments generation systems based on ontology-controlled queries.
  5. Advances in mathematical models of oncology natural history like breast cancer natural history.
  6. Advances in algorithms for hybrid recommender systems taking into account a context and a user profile.
  7. New technological stack for data analysis system and recommender systems were introduced.

The level of implementation, recommendations on implementation or outcomes of the implementation of the results

The field of application of the obtained results consists of a spectrum of disciplines, where analysis of datasets with complex structure is in high demand and inevitably requires participation of a domain expert (medical informatics, education, sociology, logistics, criminology etc.).

Effectiveness, efficiency and correctness of proposed models and methods are supported by comparative studies, testing and practical adoption. The level of implementation varies for different methods and software tools. New theoretical results in FCA, machine learning and text processing underlie almost all modern semantic technologies. Practical implementation of the proposed methods of data analysis was considered to be well-interpretable by domain experts.

The conducted research resulted in a synergy between several international collaborative projects of ISSA Lab and allowed to adopt models and methods of data analysis in practical tasks in conjunction with Gemotest Laboratory, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology (Russia), institution of education NRU HSE, LORIA and LIRIS (France), University of Zurich (Switzerland), etc.


Galitsky B., Ilvovsky D. On a Chat Bot Finding Answers with Optimal Rhetoric Representation, in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017. Varna : INCOMA Ltd, 2017. С. 253-259. 
Dudyrev F., Neznanov A., Maksimenkova O. V. Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning Environment for Material Science Education // Advances in Intelligent Systems and Computing. 2018. P. 1311-1315. 
Ignatov D. I. On closure operators related to maximal tricliques in tripartite hypergraphs // Discrete Applied Mathematics. 2018. Vol. 249. P. 74-84. doi
Galitsky B., Ilvovsky D., Kuznetsov S. Detecting logical argumentation in text via communicative discourse tree // Journal of Experimental and Theoretical Artificial Intelligence. 2018. Vol. 30. No. 5. P. 637-663. doi
Kuznetsov S., Makhalova T. On interestingness measures of formal concepts // Information Sciences. 2018. No. 442–443. P. 202-219. doi
Rubtsov V., Kamenshchikov M., Valyaev I., Leksin V., Ignatov D. I. A hybrid two-stage recommender system for automatic playlist continuation, in: 12th ACM Recommender Systems Challenge Workshop, RecSys Challenge 2018; Vancouver; Canada. Vancouver : ACM, 2018. С. 1-4. 
Andreeva E., Ignatov D. I., Grachev A., Savchenko A. Extraction of Visual Features for Recommendation of Products via Deep Learning, in: Proceedings of Analysis of Images, Social Networks and Texts – 7th International Conference, AIST 2018, Moscow, Russia, July 5-7, 2018, Revised Selected Papers. Lecture Notes in Computer Science. Berlin : Springer, 2018. С. 201-210. 
Ignatov D. I., Sinkov K., Spesivtsev P., Врабие И. В., Zyuzin V. Tree-Based Ensembles for Predicting the Bottomhole Pressure of Oil and Gas Well Flows, in: Proceedings of Analysis of Images, Social Networks and Texts – 7th International Conference, AIST 2018, Moscow, Russia, July 5-7, 2018, Revised Selected Papers. Lecture Notes in Computer Science. Berlin : Springer, 2018. С. 221-233. 
Kanovich M., Kuznetsov S., Nigam V., Scedrov A. A Logical Framework with Commutative and Non-commutative Subexponentials, in: 9TH INTERNATIONAL JOINT CONFERENCE ON AUTOMATED REASONING., 2018. С. 228-245. 
Alturki M. A., Kirigin T. B., Nigam V., Talcott C., Kanovich M., Scedrov A. Statistical Model Checking of Distance Fraud Attacks on the Hancke-Kuhn Family of Protocols, in: Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCy.: ACM, 2018. С. 60-71. 
Korepanova N. Subgroup Discovery for Treatment Optimization, in: Proceedings of the first Workshop on Data Analysis in Medicine (WDAM-2017).: EasyChair, 2018. С. 48-53. 
Kuznetsov S., Makhalova T., Napoli A. MDL for FCA: is there a place for background knowledge?, in: Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at IJCAI/ECAI 2018).: CEUR-WS, 2018. 
Кузнецов С. О., Махалова Т. П., Napoli A. Как улучшить оценку множеств признаков с помощью принципа минимальной длины описания?, in: Шестнадцатая национальная конференция по искусственному интеллекту с международным участием КИИ-2018 (24-27 сентября 2018 г., г. Москва, Россия). Труды конференции. В 2-х томах.. Москва , 2018. С. 19-26. 
Makarov I., Dmitry S., Boris L., Ignatov D. I. Predicting Winning Team and Probabilistic Ratings in Dota 2 and Counter-Strike: Global Offensive Video Games, in: Analysis of Images, Social Networks and Texts. 6th International Conference, 2017, Revised Selected Papers. Cham : Springer, 2018. С. 183-196. 
Maksimenkova O. V., Neznanov A., Papushina I. O., Parinov A. On mind maps evaluation: a case of an automatic grader development, in: Advances in Intelligent Systems and Computing. ICL 2017: Teaching and Learning in a Digital World., 2018. С. 210-221. 
Makhalova T., Napoli A., Kuznetsov S. A First Study on What MDL Can Do for FCA, in: CLA 2018: The 14th International Conference on Concept Lattices and Their Applications.: CEUR Workshop Proceedings, 2018. 
Ella Y. T. Consolidated mathematical growth Model of Breast Cancer CoMBreC, in: Proceedings of the first Workshop on Data Analysis in Medicine (WDAM-2017).: EasyChair, 2018. С. 19-42. 
Dudyrev F., Neznanov A., Maksimenkova O. V. Providing Cognitive Scaffolding Within Computer-Supported Adaptive Learning Environment for Material Science Education, in: The Challenges of the Digital Transformation in Education. Switzerland : Springer, 2019. С. 844-853.