• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Well-interpretable Methods of Knowledge Discovery and Knowledge Representation

Priority areas of development: mathematics
2018
The project has been carried out as part of the HSE Program of Fundamental Studies.

Goal of research

The research aims to develop new mathematical models, algorithms and software tools for solving problems of data mining and knowledge discovery for data with complex structure including text mining, graph mining, machine learning algorithms in classification problems of complex objects, etc. The developed methods, algorithms, and software tools will be applied in solution of practical tasks.

Thus, the domain of the research are methods, algorithms and software of data mining, visualization, ontology modelling, and automated text processing solutions. The research is focused on the features of methods and algorithms, like scope of application, precision and performance, with a special stress on interpretability.

Methodology

The research is based on methods of discrete mathematics, computer science, computational linguistics, software engineering. The main cycle for new scientific results achievement is

  • Generating hypothesis about patterns and dependencies in an application domain;
  • Designing mathematical models relevant to this pattern creation;
  • Developing algorithms and software which implement the models;
  • Experiments with software pilot on applied problems.

First, we consider fundamental mathematical models based on Formal Concept Analysis (FCA), clustering, machine learning, applied graph theory. Second, we use methods of automatic text processing and ontology modelling. Then we implement original methods and algorithms in various components of intelligent software systems. Such implementations can be tested in synthetic tasks and adopted in practical applications.

Empirical base of research

For testing purpose, we use synthetic data, gathered from electronic scientific libraries, social media services, collections of healthcare records, datasets from PhysioBank of PhysioNet project (http://physionet.ecuore.org/physiobank/), the open data repositories like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), etc.

Results of research

There are 29 scientific papers with results of the research were published during December 2017 – November 2018. The main results are:

  1. The deep review of educational data analysis with an emphasis on methodological foundation of new generation of adaptive learning software.
  2. New mathematical models of knowledge representation in ontology-based adaptive learning systems were developed.
  3. The adaptive learning and the assessment system with automatic item generation were developed.
  4. Analysis of cases, modern tools, challenges, and opportunities in the field of robojournalism with a focus on automatic news’ content and comments generation systems based on ontology-controlled queries.
  5. Advances in mathematical models of oncology natural history like breast cancer natural history.
  6. Advances in algorithms for hybrid recommender systems taking into account a context and a user profile.
  7. New technological stack for data analysis system and recommender systems were introduced.

The level of implementation, recommendations on implementation or outcomes of the implementation of the results

The field of application of the obtained results consists of a spectrum of disciplines, where analysis of datasets with complex structure is in high demand and inevitably requires participation of a domain expert (medical informatics, education, sociology, logistics, criminology etc.).

Effectiveness, efficiency and correctness of proposed models and methods are supported by comparative studies, testing and practical adoption. The level of implementation varies for different methods and software tools. New theoretical results in FCA, machine learning and text processing underlie almost all modern semantic technologies. Practical implementation of the proposed methods of data analysis was considered to be well-interpretable by domain experts.

The conducted research resulted in a synergy between several international collaborative projects of ISSA Lab and allowed to adopt models and methods of data analysis in practical tasks in conjunction with Gemotest Laboratory, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology (Russia), institution of education NRU HSE, LORIA and LIRIS (France), University of Zurich (Switzerland), etc.

Publications:


Maksimenkova O. V., Neznanov A., Radchenko I. Collaborative Learning in Data Science Education: A Data Expedition as a Formative Assessment Tool, in: The Challenges of the Digital Transformation in Education. Switzerland : Springer, 2019. P. 14-25. doi
Dudyrev F., Neznanov A., Maksimenkova O. V. Providing Cognitive Scaffolding Within Computer-Supported Adaptive Learning Environment for Material Science Education, in: The Challenges of the Digital Transformation in Education. Switzerland : Springer, 2019. P. 844-853. doi
Galitsky B., Ilvovsky D., Kuznetsov S. Detecting logical argumentation in text via communicative discourse tree // Journal of Experimental and Theoretical Artificial Intelligence. 2018. Vol. 30. No. 5. P. 637-663. doi
Kanovich M., Kuznetsov S., Nigam V., Scedrov A. A Logical Framework with Commutative and Non-commutative Subexponentials, in: 9TH INTERNATIONAL JOINT CONFERENCE ON AUTOMATED REASONING Issue 10900. Springer International Publishing AG, part of Springer Nature 2018, 2018. doi P. 228-245. doi
Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at IJCAI/ECAI 2018) / Ed. by S. Kuznetsov, A. Napoli, S. Rudolph. Vol. 2149: CEUR Workshop Proceedings. CEUR-WS, 2018.
Maksimenkova O. V., Neznanov A., Papushina I. O., Parinov A. On mind maps evaluation: a case of an automatic grader development, in: Advances in Intelligent Systems and Computing. ICL 2017: Teaching and Learning in a Digital World. , 2018. doi P. 210-221. doi
Ignatov D. I., Sinkov K., Spesivtsev P., Врабие И. В., Zyuzin V. Tree-Based Ensembles for Predicting the Bottomhole Pressure of Oil and Gas Well Flows, in: Proceedings of Analysis of Images, Social Networks and Texts – 7th International Conference, AIST 2018, Moscow, Russia, July 5-7, 2018, Revised Selected Papers. Lecture Notes in Computer Science / Ed. by W. M. van der Aalst, V. Batagelj, G. Glavaš,, D. I. Ignatov, M. Khachay, O. Koltsova, S. Kuznetsov, I. A. Lomazova, N. Loukachevitch,, A. Napoli,, A. Savchenko, A. Panchenko,, P. M. Pardalos, M. Pelillo,. Vol. 11179. Berlin : Springer, 2018. doi P. 221-233. doi
Ignatov D. I. On closure operators related to maximal tricliques in tripartite hypergraphs // Discrete Applied Mathematics. 2018. Vol. 249. P. 74-84. doi
Makarov I., Dmitry Savostyanov, Boris Litvyakov, Ignatov D. I. Predicting Winning Team and Probabilistic Ratings in Dota 2 and Counter-Strike: Global Offensive Video Games, in: Analysis of Images, Social Networks and Texts. 6th International Conference, 2017, Lecture Notes in Computer Science, Revised Selected Papers / Ed. by W. M. van der Aalst, D. I. Ignatov, M. Khachay, S. Kuznetsov, V. Lempitsky, I. A. Lomazova, A. Napoli, A. Panchenko, P. M. Pardalos, A. V. Savchenko, S. Wasserman. Vol. 10716. Cham : Springer, 2018. doi P. 183-196. doi
Andreeva E., Ignatov D. I., Grachev A., Savchenko A. Extraction of Visual Features for Recommendation of Products via Deep Learning, in: Proceedings of Analysis of Images, Social Networks and Texts – 7th International Conference, AIST 2018, Moscow, Russia, July 5-7, 2018, Revised Selected Papers. Lecture Notes in Computer Science / Ed. by W. M. van der Aalst, V. Batagelj, G. Glavaš,, D. I. Ignatov, M. Khachay, O. Koltsova, S. Kuznetsov, I. A. Lomazova, N. Loukachevitch,, A. Napoli,, A. Savchenko, A. Panchenko,, P. M. Pardalos, M. Pelillo,. Vol. 11179. Berlin : Springer, 2018. doi P. 201-210. doi
Kuznetsov S., Makhalova T. On interestingness measures of formal concepts // Information Sciences. 2018. No. 442–443. P. 202-219. doi
Rubtsov V., Kamenshchikov M., Valyaev I., Leksin V., Ignatov D. I. A hybrid two-stage recommender system for automatic playlist continuation, in: 12th ACM Recommender Systems Challenge Workshop, RecSys Challenge 2018; Vancouver; Canada. Vancouver : ACM, 2018. Ch. 16. P. 1-4. doi
Dudyrev F., Neznanov A., Maksimenkova O. V. Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning Environment for Material Science Education // Advances in Intelligent Systems and Computing. 2018. P. 1311-1315.
Ella Y Tyuryumina. Consolidated mathematical growth Model of Breast Cancer CoMBreC, in: Proceedings of the first Workshop on Data Analysis in Medicine (WDAM-2017) / Ed. by J. Baixeries, S. Boytcheva, O. Pianykh, A. Neznanov, S. Kuznetsov. Issue 6. EasyChair, 2018. doi Ch. 3. P. 19-42. doi
Korepanova N. Subgroup Discovery for Treatment Optimization, in: Proceedings of the first Workshop on Data Analysis in Medicine (WDAM-2017) / Ed. by J. Baixeries, S. Boytcheva, O. Pianykh, A. Neznanov, S. Kuznetsov. Issue 6. EasyChair, 2018. doi P. 48-53. doi
Galitsky B., Ilvovsky D. On a Chat Bot Finding Answers with Optimal Rhetoric Representation, in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017. Varna : INCOMA Ltd, 2017. P. 253-259. doi
Kuznetsov S., Makhalova T., Napoli A. MDL for FCA: is there a place for background knowledge?, in: Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at IJCAI/ECAI 2018) / Ed. by S. Kuznetsov, A. Napoli, S. Rudolph. Vol. 2149: CEUR Workshop Proceedings. CEUR-WS, 2018.
Кузнецов С. О., Махалова Т. П., Napoli A. Как улучшить оценку множеств признаков с помощью принципа минимальной длины описания? // В кн.: Шестнадцатая национальная конференция по искусственному интеллекту с международным участием КИИ-2018 (24-27 сентября 2018 г., г. Москва, Россия). Труды конференции. В 2-х томах. Т. 1. М. : РКП, 2018. С. 19-26.
Makhalova T., Napoli A., Kuznetsov S. A First Study on What MDL Can Do for FCA, in: CLA 2018: The 14th International Conference on Concept Lattices and Their Applications / Ed. by D. I. Ignatov, L. Nourine. CEUR Workshop Proceedings, 2018.
Supplementary Proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2018), Moscow, Russia, July 5-7, 2018 / Ed. by W. van der Aalst,, V. Batagelj, G. Glavaš,, D. I. Ignatov, M. Khachay,, O. Koltsova,, S. Kuznetsov, I. A. Lomazova, N. Loukachevitch,, A. Napoli,, A. Savchenko, A. Panchenko,, P. M. Pardalos, M. Pelillo,. Aachen : CEUR Workshop Proceedings, 2018.
Proceedings 16th Russian Conference on Artificial Intelligence (RCAI 2018) / Ed. by S. Kuznetsov, G. Osipov, V. L. Stefanuk. Issue 934. Cham : Springer, 2018. doi
Alturki M. A., Kirigin T. B., Nigam V., Talcott C., Kanovich M., Scedrov A. Statistical Model Checking of Distance Fraud Attacks on the Hancke-Kuhn Family of Protocols, in: Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCy. ACM, 2018. P. 60-71. doi