• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

­­­Data mining based on lattices of closed descriptions and applied ontologies

Priority areas of development: mathematics
2015
Department: Scientific-Educational Laboratory for Intelligent Systems and Structural Analysis
The project has been carried out as part of the HSE Program of Fundamental Studies.

Development of new mathematical models, algorithms and software tools for ontology-controlled intelligent analysis of big textual and structural data, machine learning algorithms in classification problems of complex objects, structural mathematical model of representation of texts in natural languages, and others. In addition, we want to apply developed tools as a solution of many practical tasks. 

Thus, the object of the research consists of methods, algorithms and software tools of data mining and visualization, ontology modelling, automatic text processing, etc. The subject of the research is the methods’ characteristics like application boundaries, performance and efficiency.

Methodology:

In the basis of the research, there are discrete mathematics, computer science and software engineering. First of all, we consider mathematical models based on Formal Concept Analysis (FCA), multimodal clustering, machine learning. In addition, we use methods of computational linguistics and ontology modelling. Then we implement original methods and algorithms in intelligent software of various kind. Such implementations can be tested in synthetic tasks and can be adopted in practical applications.

Empirical base of research:

For testing purpose we use synthetic data and widely used datasets from the open data sources like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), MovieLens service (https://movielens.org), ImhoNet service (http://imhonet.ru) and gathered social networks datasets.

Results of research: 

The main results are:

  1. Formal methods of modern medical ontologies construction and application.
  2. New machine learning methods, collaborative filtering methods taking into account information context, and its applications for real world problems including following algorithms: 
    • The context-aware recommender algorithm based on Boolean matrix factorization
    • The algorithm for text classification into abstract classes based on discourse structure.
    • The lazy associative graph classification algorithm.
  3. Collecting large amount of information sources and test datasets in the framework of theoretical studies in FCA, clustering and biclustering, text processing (more than 150 new publications and more than 60 GB new collections of synthetic and real data; in collaboration with our partners – D.Rogachev Federal Scientific and Clinical Centre of Pediatric Hematology, Oncology and Immunology (Russia), LORIA and LIRIS (France), etc.
  4. Increasing the efficiency of basic FCA algorithms implementations, namely calculating stability indices and estimating computational complexity of algorithms.
  5.  Extending of DOD-DMS (Dynamical Ontology-Driven Data Mining System) for preprocessing data from additional kinds of the outer data sources, more efficient intermediate storage of data collections with complex structure, efficient text indexing of natural languages fields of collection elements.
  6. Extending of Formal Concept Analysis Research Toolbox (based on DOD-DMS) in the field of structural data processing. Adopting new distributed architecture of the system.

Level of implementation,  recommendations on implementation or outcomes of the implementation of the results

29 scientific papers were published during 2015 year.

The field of application of the obtained results consists of a spectrum of disciplines, where analysis of large datasets is in high demand and inevitably requires participation of a domain expert (medical informatics, bioinformatics, sociology, logistics, criminology etc.).

Effectiveness, efficiency and correctness of the proposed models and methods are confirmed by comparative studies, testing and practical usage. The level of obtained integration varies for different methods and software means. New theoretical results in FCA implemented in FCART. New functionality of FCART are actively used in teaching process at the Faculty of Computer Science, NRU HSE, scientific studies of the lab, Nancy, Clermont-Ferrand and Nicosia.

The results of the research showed synergy effects of  integrating several models and methods of data analysis within the framework of a unified intelligent information system. Further development of the FCART platform for increasing efficiency of scientific research is a basic task of the future work.

Publications:


Кашницкий Ю. С., Игнатов Д. И. Ансамблевый метод машинного обучения, основанный на рекомендации классификаторов // Интеллектуальные системы. Теория и приложения. 2015. Т. 19. № 4. С. 37-55.
Scedrov A., Barthe G., Fagerholm E., Fiore D., Schmidt B., Tibouchi M. Strongly-Optimal Structure Preserving Signatures from Type II Pairings: Synthesis and Lower Bounds, in: Public-Key Cryptography -- PKC 2015 Vol. 9020. Berlin : Springer, 2015. P. 355-376. doi
Kashnitsky Y., Sergei O. Kuznetsov. Lazy Associative Graph Classification, in: Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at IJCAI 2015) / Ed. by Sergei O. Kuznetsov, A. Napoli, S. Rudolph. Buenos Aires : , 2015. P. 63-74.
Masyutin A., Kashnitsky Y., Kuznetsov S. Lazy Classication with Interval Pattern Structures: Application to Credit Scoring, in: Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at IJCAI 2015) / Ed. by Sergei O. Kuznetsov, A. Napoli, S. Rudolph. Buenos Aires : , 2015. P. 43-54.
Galitsky B., Ilvovsky D., Kuznetsov S. Rhetoric map of an answer to compound queries, in: ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference. Vol. 2: Short papers. Beijing : , 2015. P. 681-686.
Kuznetsov S., Makhalova T. Concept interestingness measures: a comparative study, in: Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, 2015 Vol. 1466. Clermont-Ferrand : CEUR Workshop Proceedings, 2015. P. 59-72.
Buzmakov A. V., Kuznetsov S., Napoli A. Fast Generation of Best Interval Patterns for Nonmonotonic Constraints, in: Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings Vol. 9285. Part 2. L., NY, Dordrecht, Heidelberg, Cham : Springer, 2015. P. 157-172.
Kaytoue M., Codocedo V., Buzmakov A. V., Baixeries J., Kuznetsov S., Napoli A. Pattern Structures and Concept Lattices for Data Mining and Knowledge Processing, in: Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings Vol. 9286. Part III. L., NY, Dordrecht, Heidelberg, Cham : Springer, 2015. P. 227-231.
Buzmakov A. V., Kuznetsov S., Napoli A. Revisiting pattern structure projections, in: Formal Concept Analysis. 13th International Conference, ICFCA 2015, Nerja, Spain, June 23-26, 2015, Proceedings Vol. 9113. Springer, 2015. P. 200-215.
Ignatov D. I., Ахматнуров М. Context-Aware Recommender System Based on Boolean Matrix Factorisation, in: Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, 2015 Vol. 1466. Clermont-Ferrand : CEUR Workshop Proceedings, 2015. P. 99-110.
Зудин С., Gnatyshak D. V., Ignatov D. I. Putting OAC-triclustering on MapReduce, in: Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, 2015 Vol. 1466. Clermont-Ferrand : CEUR Workshop Proceedings, 2015. P. 47-58.
Slezak D., Кашницкий Ю. С., Кузнецов С. О. Infobright: оптимизация SQL запросов с помощью приближений теории неточных множеств // Информационные системы и технологии. 2015
Ignatov D. I., Sarwar S. M., Hasan M., Billal M. Similarity Aggregation for Collaborative Filtering, in: Analysis of Images, Social Networks and Texts. 4th International Conference, AIST 2015, Yekaterinburg, Russia, April 9–11, 2015, Revised Selected Papers / Ed. by M. Y. Khachay, N. Konstantinova, A. Panchenko, D. I. Ignatov, V. Labunets. Vol. 542: Series: Communications in Computer and Information Science. Switzerland : Springer, 2015.
Neznanov A., Parinov A. Analyzing Social Networks Services Using FormalConcept Analysis Research Toolbox, in: CEUR Workshop Proceedings. Proceedings of the International Workshop on Social Network Analysis using Formal Concept Analysis (SNAFCA 2015) / Ed. by R. Missaoui, S. Kuznetsov, S. Obiedkov. Issue 1534: SNAFCA 2015 Social Network Analysis using Formal Concept Analysis. Malaga : CEUR Workshop Proceedings, 2015. Ch. 5. P. 43-54.
Galitsky B., Ilvovsky D., Kuznetsov S. Text integrity assessment: Sentiment profile vs rhetoric structure, in: Computational Linguistics and Intelligent Text Processing. 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part II. Vol. 9042. Berlin : Springer, 2015. P. 126-139. doi
Galitsky B., Ilvovsky D., Kuznetsov S. Text Classification into Abstract Classes Based on Discourse Structure, in: Proceedings of the Recent Advances in Natural Language Processing, RANLP 2015. Hissar : , 2015. P. 201-207.
CEUR Workshop Proceedings. Proceedings of the International Workshop on Social Network Analysis using Formal Concept Analysis (SNAFCA 2015) / Ed. by R. Missaoui, S. Kuznetsov, S. Obiedkov. Issue 1534: SNAFCA 2015 Social Network Analysis using Formal Concept Analysis. Malaga : CEUR Workshop Proceedings, 2015.
Makhalova T., Ilvovsky D., Galitsky B. News clustering approach based on discourse text structure, in: ACL-IJCNLP 2015, Proceedings of the First Workshop on Computing News Storylines. Beijing : , 2015. P. 16-20.
Makhalova T., Ilvovsky D., Galitsky B. Pattern structures for news clustering, in: Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at IJCAI 2015) / Ed. by Sergei O. Kuznetsov, A. Napoli, S. Rudolph. Buenos Aires : , 2015. P. 35-42.