A related task is the training of HSE master's and bachelor's students specializing in the aforementioned fields. Recently, data mining research is of high importance with regards to developing a new generation of intelligent systems. Web data-mining is becoming increasingly important; it includes social network analysis, recommender systems, distributed databases of textual documents, etc.
Data-mining research ultimately must deal with large datasets. Distributed large databases especially require new approaches and highly efficient algorithms. Employing Formal Concept Analysis models simplifies the discovery and mining of hidden knowledge in such large datasets.
The main goal of the research is to develop and implement new methods and algorithms for analysing structured and unstructured data, to develop program tools for distributed data processing, and to apply these tools to solving practical tasks. Thus, the object of the research consists of methods, algorithms, and software tools for data-mining of structured and unstructured data. The subject of the research is how well the methods perform and how efficient they are. We first considered methods based on FCA, multimodal clustering, and computational linguistics.
The main results include the following:
1) Collecting a large amount of information sources and test datasets within the framework of theoretical studies in FCA, clustering and biclustering, text processing (more than 80 new sources and more than 60 Gb of new collections of synthetic and real data; in collaboration with our partners – Yandex, Dmitry Rogachev Clinical Center, Digital Society Lab, LORIA and LIRIS (France), etc.; datasets of more than 230 Tb have been prepared for further analysis);
2) Increasing the efficiency of implementing basic FCA algorithms, specifically building concept lattices and calculating stability indices;
3) Creating a prototype of an original programming module for managing pattern structures, and testing this component on objects represented by sequences and graphs;
4) Creating several versions of methods and algorithms for clustering and classifications on triadic datasets, testing their implementations in Web recommender services, contextual advertisement and crowdsourcing;
5) Developing a DOD-DMS (Dynamical Ontology-Driven Data Mining System) for preprocessing data from external sources, local storage for data that has a complex structure, and efficient text-indexing in natural languages;
6) Developing a Formal Concept Analysis Research Toolbox (based on DOD-DMS) to release version 0.8 and transition to developing version 0.9; finalizing the software means for analysing various kinds of formal concept indices; developing the software means for processing pattern structures, a report editor, and embedded script language.
The field of application for the obtained results consists of a spectrum of disciplines, where the analysis of large datasets is in high demand and inevitably requires the participation of expert analysts (in medical informatics, bioinformatics, sociology, logistics, criminology, etc.).
The effectiveness, efficiency, and correctness of the proposed models and methods are confirmed by comparative studies, testing, and practical usage. The level of obtained integration varies for different methods and software means. New theoretical results in FCA are rather comprehensively implemented in FCART. Preliminary versions of FCART are actively being used in the teaching process at the Department of Applied Mathematics and Computer Science, laboratory scientific studies, Dmitry Rogachev Clinical Center, and universities in Dresden, Nancy, Clermont-Ferrand, and Nicosia.
The conducted research had a synergistic effect and allowed the tasks of integrating several models and methods of data analysis to be put within the framework of a unified intelligent information system. The further development of a platform for increasing the efficiency of our scientific research is a basic task for our future work and for constructing software tools.