Thus, the object of the research consists of methods, algorithms and software tools of data mining and visualization, ontology modelling, automatic text processing, etc. The subject of the research is the methods’ characteristics like application boundaries, performance and efficiency.
In the basis of the research, there are discrete mathematics, computer science and software engineering. First of all, we consider mathematical models based on Formal Concept Analysis (FCA), multimodal clustering, machine learning. In addition, we use methods of computational linguistics and ontology modelling. Then we implement original methods and algorithms in intelligent software of various kind. Such implementations can be tested in synthetic tasks and can be adopted in practical applications.
Empirical base of research:
For testing purpose we use synthetic data and widely used datasets from the open data sources like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), MovieLens service (https://movielens.org), ImhoNet service (http://imhonet.ru) and gathered social networks datasets.
Results of research:
The main results are:
- Formal methods of modern medical ontologies construction and application.
- New machine learning methods, collaborative filtering methods taking into account information context, and its applications for real world problems including following algorithms:
- The context-aware recommender algorithm based on Boolean matrix factorization
- The algorithm for text classification into abstract classes based on discourse structure.
- The lazy associative graph classification algorithm.
- Collecting large amount of information sources and test datasets in the framework of theoretical studies in FCA, clustering and biclustering, text processing (more than 150 new publications and more than 60 GB new collections of synthetic and real data; in collaboration with our partners – D.Rogachev Federal Scientific and Clinical Centre of Pediatric Hematology, Oncology and Immunology (Russia), LORIA and LIRIS (France), etc.
- Increasing the efficiency of basic FCA algorithms implementations, namely calculating stability indices and estimating computational complexity of algorithms.
- Extending of DOD-DMS (Dynamical Ontology-Driven Data Mining System) for preprocessing data from additional kinds of the outer data sources, more efficient intermediate storage of data collections with complex structure, efficient text indexing of natural languages fields of collection elements.
- Extending of Formal Concept Analysis Research Toolbox (based on DOD-DMS) in the field of structural data processing. Adopting new distributed architecture of the system.
Level of implementation, recommendations on implementation or outcomes of the implementation of the results
29 scientific papers were published during 2015 year.
The field of application of the obtained results consists of a spectrum of disciplines, where analysis of large datasets is in high demand and inevitably requires participation of a domain expert (medical informatics, bioinformatics, sociology, logistics, criminology etc.).
Effectiveness, efficiency and correctness of the proposed models and methods are confirmed by comparative studies, testing and practical usage. The level of obtained integration varies for different methods and software means. New theoretical results in FCA implemented in FCART. New functionality of FCART are actively used in teaching process at the Faculty of Computer Science, NRU HSE, scientific studies of the lab, Nancy, Clermont-Ferrand and Nicosia.
The results of the research showed synergy effects of integrating several models and methods of data analysis within the framework of a unified intelligent information system. Further development of the FCART platform for increasing efficiency of scientific research is a basic task of the future work.