• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Discovering and Representing Knowledge for Recommender Systems

Priority areas of development: IT and mathematics
2019
The project has been carried out as part of the HSE Program of Fundamental Studies.

Goal of research

The research aims to develop new mathematical models, algorithms and software tools for solving problems of data mining and knowledge discovery for data with complex structure including text mining, graph mining, machine learning algorithms in classification problems of complex objects, and the others. The developed methods, algorithms and software tools will be applied in solution of practical tasks.

Thus, the object of the research consists of methods, algorithms and data mining and visualization software, ontology modelling, and automatic text processing solutions, etc. The subject of the research is the features of methods and algorithms, like scope of application, precision and performance, but with special interest in interpretability (explainability).

Methodology

The research is based on methods of discrete mathematics, computer science, computational linguistics, software engineering. The main cycle for new scientific results achievement is

  • Suggesting hypothesis about patterns and regularities during data processing in application areas;
  • A mathematic model (or models) suitable to this pattern creation;
  • Algorithms and software, which implements them, development;
  • Software pilot on applied problems.

First, we consider fundamental mathematical models based on Formal Concept Analysis (FCA), clustering, machine learning, applied graph theory. Second, we use methods of automatic text processing and ontology modelling. Then we implement original methods and algorithms in various components of intelligent software systems. Such implementations can be tested in synthetic tasks and adopted in practical applications.

Empirical base of research

For testing purpose, we use synthetic data, gathered from electronic scientific libraries, social media services, collections of healthcare records, datasets from HSE research collaborations, Internet Argument Corpus, FactBank, the open data repositories like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), etc.

Results of research

There are 27 scientific papers with results of the research were published during December 2018 – November 2019. The main results are:

  1. Development of neural network classifiers of natural language texts based on discourse structure.
  2. Proposal of original methods for efficient classifying of network packets based on closed descriptions.
  3. Advances in mathematical models of breast cancer natural history taking into account four main forms of the decease.
  4. Research of data complexity of ontology queries with descriptive logic.
  5. Research of original methods of collaboration prediction in co-authorship networks
  6. Update of a technological stack for data gathering from open data sources.
  7. Development of automatic item generation system based on new mathematical models of domain knowledge representation.
  8. Development of engineering graphics automated assessment system based on interactive work in CAD-software.

The level of implementation, recommendations on implementation or outcomes of the implementation of the results

The field of application of the obtained results consists of a spectrum of disciplines, where analysis of datasets with complex structure is in high demand and inevitably requires participation of a domain expert (medical informatics, education, sociology, logistics, criminology etc.).

Effectiveness, efficiency and correctness of proposed models and methods are supported by comparative studies, testing and practical adoption. The level of implementation varies for different methods and software tools. New theoretical results in FCA, machine learning and text processing underlie almost all modern semantic technologies. Practical implementation of the proposed methods of data analysis was considered to be well explainable by domain experts.

The conducted research resulted in a synergy effect of several international collaborative projects of ISSA Lab and allowed to adopt models and methods of data analysis in practical tasks in conjunction with Gemotest Laboratory, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology (Russia), institution of education NRU HSE, LORIA and LIRIS (France), TU-Dresden (Germany), etc.

Publications:


Makarov I., Dmitrii Maslov, Gerasimova O., Vladimir Aliev, Alisa Korinevskaya, Sharma U., Wang H. On Reproducing Semi-dense Depth Map Reconstruction using Deep Convolutional Neural Networks with Perceptual Loss, in: Proceedings of 27th ACM International Conference on Multimedia. NY : ACM, 2019. P. 1080-1084. doi
Maksimenkova O. V., Neznanov A., Radchenko I. Using Data Expedition as a Formative Assessment Tool in Data Science Education: Reasoning, Justification, and Evaluation // International Journal of Emerging Technologies in Learning. 2019. Vol. 14. No. 11. P. 107-122. doi
Galitsky B., Ilvovsky D. On the End-to-End Argument Validation System based on Communicative Discourse Trees, in: Proceedings of the 19th Workshop on Computational Models of Natural Argument (CMNA 2019) co-located with the 14th International Conference on Persuasive Technology (PERSUASIVE 2019). CEUR Workshop Proceedings, 2019. P. 5-16.
Makhalova T., Galitsky B., Ilvovsky D. Information Retrieval Chatbots Based on Conceptual Models, in: International Conference on Conceptual Structures Vol. 11530. Springer, 2019. doi P. 230-238. doi
Gerasimova O., Makarov I. Link Prediction Regression for Weighted Co-authorship Networks, in: Advances in Computational Intelligence. IWANN 2019. Berlin : Springer, 2019. doi P. 667-677. doi
Kuznetsov S. The logic of action lattices is undecidable, in: 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2019). IEEE, 2019. Ch. 36. P. 1-9. doi
Korepanova N., Seibold H., Steffen V., Hothorn T. Survival Forests under Test: Impact of the Proportional Hazards Assumption on Prognostic and Predictive Forests for ALS Survival // Statistical Methods in Medical Research. 2019
Gerasimova O., Makarov I. Higher School of Economics Co-Authorship Network Study, in: Proceedings of 2nd International Conference on Computer Applications & Information Security (ICCAIS). NY : IEEE, 2019. P. 1-4. doi
Ilvovsky D., Galitsky B. Discourse-Based Approach to Involvement of Background Knowledge for Question Answering, in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019. INCOMA Ltd, 2019. P. 373-381.
Galitsky B., Ilvovsky D. Two Discourse Tree-Based Approaches to Indexing Answers, in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019. INCOMA Ltd, 2019. P. 367-373.
Ignatov D. I., Egurnov D., Точилкин Д. С. Multimodal Clustering of Boolean Tensors on MapReduce: Experiments Revisited, in: Supplementary Proceedings ICFCA 2019 Conference and Workshops Vol. 2378. CEUR Workshop Proceedings, 2019. P. 137-151.
Ignatov D. I., Egurnov D. Triclustring Toolbox, in: Supplementary Proceedings ICFCA 2019 Conference and Workshops Vol. 2378. CEUR Workshop Proceedings, 2019. P. 65-69.
Makarov I., Gerasimova O. Predicting Collaborations in Co-authorship Network, in: Proceedings of the 14th International Workshop on Semantic and Social Media Adaptation and Personalization. NY : IEEE, 2019. P. 1-6. doi
Kodryan M., Grachev A., Ignatov D. I., Vetrov D. Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks, in: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) Issue W19-43. Florence, Italy: Association for Computational Linguistics, 2019. P. 40-48. doi
Galitsky B., Ilvovsky D., Goncharova E. On a Chatbot Providing Virtual Dialogues, in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019. INCOMA Ltd, 2019. P. 382-387. doi