• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Mathematical models, algorithms and software for data mining in the text and the structural form

Priority areas of development: mathematics
2014
Department: Scientific-Educational Laboratory for Intelligent Systems and Structural Analysis
The project has been carried out as part of the HSE Program of Fundamental Studies.

Goal of research: development of original models of knowledge representation and intelligent data analysis methods, including new methods of Formal Concept Analysis (FCA), new triclustering algorithms, machine learning algorithms for classification problems of complex objects, structural mathematical models of representation of texts in natural languages, and others.

Methodology: Discrete mathematics, computational logic, formal concept analysis, machine learning, data mining, computer linguistics, ontology modeling, theory of algorithms, software engineering.

Empirical base of research: Data sets in the form of relational databases, collections of texts in natural languages, structural data in the form of graph models of different types.

Results of research:

1. The analysis of the bibliography and completion of reviews on areas of formal concept analysis, recommender systems, ontological modeling, and accessibility of freely available data sources. Within the theoretical research in FCA, clustering, and text processing the laboratory team has accumulated a significant amount of benchmarking datasets.

2. The increased efficiency of the implementation of basic FCA algorithms for construction formal concept lattices and calculation stability indices of formal concepts; the implementation has been used in the tasks of medical informatics. Development of new versions of methods and algorithms for clustering and classification on tricontexts; the implementation has been tested and used in online recommender systems.

3. Development of the prototype of the original software component proposed in 2013 to work with pattern structures. It is integrated with the means of solving the problems of classification, which allowed approaching the creation of a universal system for studying problems of classification on the basis of diverse complex attributes, including interval structures, sequences and graphs.

4. Recent modifications of DOD-DMS platform that simplifies the construction of the scientific and applied software systems in the field of data analysis, especially in pre-processing of data from external sources, local storage of complex data, and efficient indexing of natural language texts. Updates of several subsystems of the automated research system FCART (Formal Concept Analysis Research Toolbox), which aimed at FCA researchers and related areas of discrete mathematics and data analysis, have been performed. A refined set of tools features tools for analyzing indexes formal concepts of any kind, tools for pattern structures processing, report editor, and built-in scripting language.

5. A methodological and technological basis for processing "big data" in the Internet. Several variants of systems to maintain access to heterogeneous data sources have been proposed. The second version of the local data storage subsystem of FCART including a new scheme of user authorization has been created. Using the local storage subsystem, a prototype to work with open data sources has been implemented.

Level of implementation, recommendations on implementation or outcomes of the implementation of the results: the application of methods for development of intelligent systems; the use of the developed software to analyze complex data in various fields (the testing usage has been conducted in the following domains: health care, law enforcement, e-commerce and Internet marketing).

Publications:


Neznanov A., Parinov A. About Universality and Flexibility of FCA-based Software Tools, in: Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at ECAI 2014) / Ed. by S. Kuznetsov, A. Napoli, S. Rudolph. Vol. 1257. Prague : CEUR Workshop Proceedings, 2014. P. 59-66.
Максименкова О. В., Незнанов А. А., Подбельский В. В. О формирующем контроле и информативной обратной связи при проектировании учебных курсов по программированию // Вестник Российского университета дружбы народов. Серия: Информатизация образования. 2014. № 4. С. 37-48.
Neznanov A., Ilvovsky D., Parinov A. Advancing FCA Workflow in FCART System for Knowledge Discovery in Quantitative Data, in: Procedia Computer Science. 2nd International Conference on Information Technology and Quantitative Management, ITQM 2014. National Research University Higher School of Economics (HSE) in Moscow (Russia) on June 3-5, 2014 / Ed. by Y. Shi, A. Lepskiy, F. T. Aleskerov. Vol. 31. Amsterdam : Elsevier, 2014. P. 201-210.
Galitsky B., Ilvovsky D., Kuznetsov S., Strok F. V. Finding Maximal Common Sub-parse Thickets for Multi-sentence Search, in: Graph Structures for Knowledge Representation and Reasoning Third International Workshop, GKR 2013, Beijing, China, August 3, 2013. Revised Selected Papers Editors: Madalina Croitoru, Sebastian Rudolph, Stefan Woltran, Christophe Gonzales. Springer International Publishing. 2014.. Berlin : Springer, 2014. P. 39-57.
Гадельшин И. Ф., Antonova A. Y., Ilvovsky D. Detection of Domain-Specific Trends in Text Collections, in: Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers / Ed. by D. I. Ignatov, M. Y. Khachay, A. Panchenko, N. Konstantinova, R. Yavorskiy. Vol. 439. Berlin : Springer, 2014. P. 78-84.
Poelmans J., Ignatov D. I., Kuznetsov S., Dedene G. Fuzzy and rough formal concept analysis: a survey // International Journal of General Systems. 2014. Vol. 43. No. 2. P. 105-134. doi
Кашницкий Ю. С. Визуальная аналитика в задаче трикластеризации многомерных данных // Труды Московского физико-технического института. 2014. Т. 6. № 3. С. 43-56.
Ignatov D. I., Kaminskaya A. Y., Konstantinova N., Konstantinov A. V. Recommender system for crowdsourcing platform Witology, in: Proceedings of The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2014, 11-14 August 2014 Warsaw, Poland / Ed. by D. Slezak, H. S. Nguyen, M. Reformat, S. J. Eugene. Los Alamitos, Washington, Tokyo : IEEE Computer Society, 2014. P. 327-335.
Gnatyshak D. V., Ignatov D. I., Kuznetsov S., Nourine L. A One-Pass Triclustering Approach: Is There any Room for Big Data?, in: CLA 2014: Proceedings of the Eleventh International Conference on Concept Lattices and Their Applications. Kosice : Pavol Jozef Safarik University, 2014. P. 231-242.
Gnatyshak D. V. Greedy Modifications of OAC-triclustering Algorithm, in: Procedia Computer Science. 2nd International Conference on Information Technology and Quantitative Management, ITQM 2014. National Research University Higher School of Economics (HSE) in Moscow (Russia) on June 3-5, 2014 / Ed. by Y. Shi, A. Lepskiy, F. T. Aleskerov. Vol. 31. Amsterdam : Elsevier, 2014. P. 1116-1123.
Penikas H. I., Петров В. С., Анохина М. В. Identifying SIFI Determinants for Global Banks and Insurance Companies: Implications for D-SIFIs in Russia / University of Pavia (Italy). Series ISSN: 2281-1346 "DEM Working Paper Series". 2014. No. 85.
Ignatov D. I., Zhuk R., Konstantinova N. Learning hypotheses from triadic labeled data, in: Proceedings of The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2014, 11-14 August 2014 Warsaw, Poland / Ed. by D. Slezak, H. S. Nguyen, M. Reformat, S. J. Eugene. Los Alamitos, Washington, Tokyo : IEEE Computer Society, 2014. P. 474-480.
Zhuk R., Ignatov D. I., Konstantinova N. Concept Learning from Triadic Data // Procedia Computer Science. 2014. Vol. 31. P. 928-938. doi
Ignatov D. I., Kaminskaya A. Y., Malioukov A., Konstantinova N., Poelmans J. FCA-Based Recommender Models and Data Analysis for Crowdsourcing Platform Witology, in: Proceedings of International Conference on Conceptual Structures 2014 Vol. 8577: Graph-Based Representation and Reasoning. Springer, 2014. P. 287-292.
Игнатов Д. И. Драфт главы к учебнику. Обзор по методам рекомендательных систем // В кн.: Модели и методы анализа данных. Юрайт, 2014. С. 1-42.
Ignatov D. I., Ненова Е. Н., Konstantinov A. V., Константинова Н. С. Boolean Matrix Factorisation for Collaborative Filtering: An FCA-Based Approach, in: Artificial Intelligence: Methodology, Systems, and Applications 16th International Conference, AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings / Ed. by G. Agre, P. Hitzler, A. A. Krisnadhi, S. Kuznetsov. Vol. 8722. L., NY, Dordrecht, Heidelberg, Cham : Springer, 2014. P. 47-58.
Korepanova N., Kuznetsov S., Karachunskiy A. I. Matchings and Decision Trees for Determining Optimal Therapy, in: Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers / Ed. by D. I. Ignatov, M. Y. Khachay, A. Panchenko, N. Konstantinova, R. Yavorskiy. Vol. 439. Berlin : Springer, 2014. P. 101-110.
Kashnitsky Y., Ignatov D. I. Can FCA-based Recommender System Suggest a Proper Classifier?, in: Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at ECAI 2014) / Ed. by S. Kuznetsov, A. Napoli, S. Rudolph. Vol. 1257. Prague : CEUR Workshop Proceedings, 2014. Ch. 3. P. 17-26.
Chernyak E. L. An approach to the problem of annotation of research publications, in: Proceedings of The Eighth International Conference on Web Search and Data Mining. NY, United States of America : ACM, 2014. Ch. 58. P. 429-434.
Neznanov A., Parinov A. FCA Analyst Session and Data Access Tools in FCART, in: Artificial Intelligence: Methodology, Systems, and Applications 16th International Conference, AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings / Ed. by G. Agre, P. Hitzler, A. A. Krisnadhi, S. Kuznetsov. Vol. 8722. L., NY, Dordrecht, Heidelberg, Cham : Springer, 2014. P. 214-221.
Kashnitsky Y. Recommender-based Multiple Classifier System, in: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. PhD Session Proceedings. , 2014. P. 21-30.
Slezak D., Кашницкий Ю. С., Кузнецов С. О. Неточные множества для оптимизации SQL-запросов // Открытые системы. СУБД. 2014. № 10. С. 44-45.
Supplementary Proceedings of the 3rd International Conference on Analysis of Images, Social Networks and Texts (AIST 2014) / Ed. by D. I. Ignatov, M. Y. Khachay, A. Panchenko, N. Konstantinova, R. Yavorsky, D. Ustalov. Vol. 1197: Supplementary Proceedings of AIST 2014. Ekaterinburg : CEUR Workshop Proceedings, 2014.
Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at ECAI 2014) / Ed. by S. Kuznetsov, A. Napoli, S. Rudolph. Vol. 1257. Prague : CEUR Workshop Proceedings, 2014.
Kashnitsky Y. Visual analytics in FCA-based triclustering, in: Supplementary Proceedings of the 3rd International Conference on Analysis of Images, Social Networks and Texts (AIST 2014) / Ed. by D. I. Ignatov, M. Y. Khachay, A. Panchenko, N. Konstantinova, R. Yavorsky, D. Ustalov. Vol. 1197: Supplementary Proceedings of AIST 2014. Ekaterinburg : CEUR Workshop Proceedings, 2014. Ch. 12. P. 69-80.
Kaytoue M., Kuznetsov S., Macko J., Napoli A. Biclustering meets triadic concept analysis // Annals of Mathematics and Artificial Intelligence. 2014. Vol. 70. No. 1. P. 55-79. doi