Goal of research
The research aims to develop new mathematical models, algorithms and software tools for solving problems of data mining and knowledge discovery for data with complex structure including text mining, graph mining, machine learning algorithms in classification problems of complex objects, and the others. The developed methods, algorithms and software tools will be applied in solution of practical tasks.
Thus, the object of the research consists of methods, algorithms and data mining and visualization software, ontology modelling, and automatic text processing solutions, etc. The subject of the research is the features of methods and algorithms, like scope of application, precision and performance, but with special interest in interpretability (explainability).
The research is based on methods of discrete mathematics, computer science, computational linguistics, software engineering. The main cycle for new scientific results achievement is
- Suggesting hypothesis about patterns and regularities during data processing in application areas;
- A mathematic model (or models) suitable to this pattern creation;
- Algorithms and software, which implements them, development;
- Software pilot on applied problems.
First, we consider fundamental mathematical models based on Formal Concept Analysis (FCA), clustering, machine learning, applied graph theory. Second, we use methods of automatic text processing and ontology modelling. Then we implement original methods and algorithms in various components of intelligent software systems. Such implementations can be tested in synthetic tasks and adopted in practical applications.
Empirical base of research
For testing purpose, we use synthetic data, gathered from electronic scientific libraries, social media services, collections of healthcare records, datasets from HSE research collaborations, Internet Argument Corpus, FactBank, the open data repositories like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), etc.
Results of research
There are 27 scientific papers with results of the research were published during December 2018 – November 2019. The main results are:
- Development of neural network classifiers of natural language texts based on discourse structure.
- Proposal of original methods for efficient classifying of network packets based on closed descriptions.
- Advances in mathematical models of breast cancer natural history taking into account four main forms of the decease.
- Research of data complexity of ontology queries with descriptive logic.
- Research of original methods of collaboration prediction in co-authorship networks
- Update of a technological stack for data gathering from open data sources.
- Development of automatic item generation system based on new mathematical models of domain knowledge representation.
- Development of engineering graphics automated assessment system based on interactive work in CAD-software.
The level of implementation, recommendations on implementation or outcomes of the implementation of the results
The field of application of the obtained results consists of a spectrum of disciplines, where analysis of datasets with complex structure is in high demand and inevitably requires participation of a domain expert (medical informatics, education, sociology, logistics, criminology etc.).
Effectiveness, efficiency and correctness of proposed models and methods are supported by comparative studies, testing and practical adoption. The level of implementation varies for different methods and software tools. New theoretical results in FCA, machine learning and text processing underlie almost all modern semantic technologies. Practical implementation of the proposed methods of data analysis was considered to be well explainable by domain experts.
The conducted research resulted in a synergy effect of several international collaborative projects of ISSA Lab and allowed to adopt models and methods of data analysis in practical tasks in conjunction with Gemotest Laboratory, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology (Russia), institution of education NRU HSE, LORIA and LIRIS (France), TU-Dresden (Germany), etc.