Goal of research
The research aims to develop new mathematical models, algorithms and software tools for solving problems of data mining and knowledge discovery for data with complex structure including text mining, graph mining, machine learning algorithms in classification problems of complex objects, etc. The developed methods, algorithms, and software tools will be applied in solution of practical tasks.
Thus, the domain of the research are methods, algorithms and software of data mining, visualization, ontology modelling, and automated text processing solutions. The research is focused on the features of methods and algorithms, like scope of application, precision and performance, with a special stress on interpretability.
The research is based on methods of discrete mathematics, computer science, computational linguistics, software engineering. The main cycle for new scientific results achievement is
- Generating hypothesis about patterns and dependencies in an application domain;
- Designing mathematical models relevant to this pattern creation;
- Developing algorithms and software which implement the models;
- Experiments with software pilot on applied problems.
First, we consider fundamental mathematical models based on Formal Concept Analysis (FCA), clustering, machine learning, applied graph theory. Second, we use methods of automatic text processing and ontology modelling. Then we implement original methods and algorithms in various components of intelligent software systems. Such implementations can be tested in synthetic tasks and adopted in practical applications.
Empirical base of research
For testing purpose, we use synthetic data, gathered from electronic scientific libraries, social media services, collections of healthcare records, datasets from PhysioBank of PhysioNet project (http://physionet.ecuore.org/physiobank/), the open data repositories like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), etc.
Results of research
There are 29 scientific papers with results of the research were published during December 2017 – November 2018. The main results are:
- The deep review of educational data analysis with an emphasis on methodological foundation of new generation of adaptive learning software.
- New mathematical models of knowledge representation in ontology-based adaptive learning systems were developed.
- The adaptive learning and the assessment system with automatic item generation were developed.
- Analysis of cases, modern tools, challenges, and opportunities in the field of robojournalism with a focus on automatic news’ content and comments generation systems based on ontology-controlled queries.
- Advances in mathematical models of oncology natural history like breast cancer natural history.
- Advances in algorithms for hybrid recommender systems taking into account a context and a user profile.
- New technological stack for data analysis system and recommender systems were introduced.
The level of implementation, recommendations on implementation or outcomes of the implementation of the results
The field of application of the obtained results consists of a spectrum of disciplines, where analysis of datasets with complex structure is in high demand and inevitably requires participation of a domain expert (medical informatics, education, sociology, logistics, criminology etc.).
Effectiveness, efficiency and correctness of proposed models and methods are supported by comparative studies, testing and practical adoption. The level of implementation varies for different methods and software tools. New theoretical results in FCA, machine learning and text processing underlie almost all modern semantic technologies. Practical implementation of the proposed methods of data analysis was considered to be well-interpretable by domain experts.
The conducted research resulted in a synergy between several international collaborative projects of ISSA Lab and allowed to adopt models and methods of data analysis in practical tasks in conjunction with Gemotest Laboratory, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology (Russia), institution of education NRU HSE, LORIA and LIRIS (France), University of Zurich (Switzerland), etc.