• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Explanation-oriented Methods of  Data Analysis for Semantically Rich Data and Their Applications

Priority areas of development: mathematics
2017
The project has been carried out as part of the HSE Program of Fundamental Studies.

Goal of research:

The research aims at developing new mathematical models, algorithms and software tools for solving problems of data mining and knowledge discovery for data with complex structure including text mining, graph mining, machine learning algorithms in classification problems of complex objects, and others. The developed methods, algorithms and software tools will be applied in solution of practical tasks.

Thus, the object of the research consists in methods, algorithms and software tools of data mining and visualization, ontology modelling, automatic text processing, etc. The subject of the research is the features of methods and algorithms, like scope of application, precision and performance, but with special interest in interpretability (explainability).

Methodology:

The research is based on methods of discrete mathematics, computer science, computational linguistics, software engineering. First, we consider fundamental mathematical models based on Formal Concept Analysis (FCA), clustering, machine learning, applied graph theory. Second, we use methods of automatic text processing and ontology modelling. Then we implement original methods and algorithms in various components of intelligent software systems. Such implementations can be tested in synthetic tasks and adopted in practical applications.

Empirical base of research:

For testing purpose, we use synthetic data, gathered from electronic scientific libraries, social media services, collection of healthcare records, NRU HSE students’ works, the open data repositories like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), etc.

Results of research:

20 scientific papers with results of the research during December 2016 – November 2017. The main results are:

1. New algorithms for text fragments classification and similarity analysis based on syntactic and discourse structures of fragments.

2. Advances in relevance analysis of texts based on annotated suffix trees.

3. Advances in original lazy classification methods applied to clinical informatics tasks including oncology therapy optimization.

4. Advances in models for prediction of natural history of breast cancer.

5. Implementation of new approaches to interpretation and analysis of frequent closed sets of attributes.

6. Deep research of educational data mining in adaptive learning.

7. New methods of automated assessment of mind maps.

8. New strategies and technologies for the deployment of container nodes to gather data from external data sources.

The level of implementation, recommendations on implementation or outcomes of the implementation of the results

The field of application of the obtained results consists of a spectrum of disciplines, where analysis of datasets with complex structure is in high demand and inevitably requires participation of a domain expert (medical informatics, education, sociology, logistics, criminology etc.).

Effectiveness, efficiency and correctness of the proposed models and methods are supported by comparative studies, testing and practical adoption. The level of implementation varies for different methods and software tools. New theoretical results in FCA, machine learning and text processing underlie almost all modern semantic technologies. Practical implementation of the proposed methods of data analysis was considered to be well explainable by domain experts.

The conducted research resulted in a synergy effect of several international collaborative projects of ISSA Lab and allowed to adopt models and methods of data analysis in practical tasks in conjunction with Gemotest Laboratory, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology (Russia), LORIA and LIRIS (France), TU-Dresden (Germany), etc.

Publications:


Kanovich M., Brotherston J., Gorogiannis N. Biabduction (and Related Problems) in Array Separation Logic, in: 26th International Conference on Automated Deduction – CADE 26. Springer, 2017. doi P. 472-490. doi
Kanovich M., Scedrov A., Kirigin T. B., Nigam V., Talcott C. Time, computational complexity, and probability in the analysis of distance-bounding protocols // Journal of Computer Security. 2017. Vol. 25. No. 6. P. 585-630. doi
Masyutin A., Kashnitsky Y. Query-Based Versus Tree-Based Classification: Application to Banking Data, in: Foundations of Intelligent Systems. Warsz. : Springer International Publishing, 2017. P. 664-673. doi
Kanovich M., Kirigin T. B., Nigam V., Scedrov A., Talcott C., Perovic R. A rewriting framework and logic for activities subject to regulations // Mathematical Structures in Computer Science. 2017. Vol. 27. No. 3. P. 332-375. doi
Maksimenkova O. V., Neznanov A., Papushina I. O., Parinov A. On mind maps evaluation: a case of an automatic grader development // Advances in Intelligent Systems and Computing. 2018. Vol. 2. P. 210-221.
Zakharyaschev M., BRESOLIN D., KURUCZ A., MUÑOZ-VELASCO E., RYZHIKOV V., SCIAVICCO G. Horn fragments of the Halpern-Shoham interval temporal logic, in: ACM Transactions on Computational Logic (TOCL) Vol. 18. Issue 3. NY : ACM, 2017. P. 1-39. doi
Kanovich M., Scedrov A., Kuznetsov S. Undecidability of the Lambek Calculus with Subexponential and Bracket Modalities, in: 21st International Symposium, Fundamentals of Computation Theory 2017, FCT 2017. Springer Berlin Heidelberg, 2017. doi P. 326-340. doi
Максим Ю., Ignatov D. I. Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress // Lecture Notes in Computer Science. 2017. Vol. 10314. P. 558-569. doi
Makarov I., Bulanov O., Zhukov L. E. Co-author Recommender System, in: Models, Algorithms, and Technologies for Network Analysis. Springer Proceedings in Mathematics & Statistics / Ed. by V. A. Kalyagin, A. I. Nikolaev, P. M. Pardalos, O. Prokopyev . Vol. 197. Springer International Publishing, 2017. doi P. 251-257. doi
Belfodil A., Kuznetsov S., Robardet C., Kaytoue M. Mining convex polygon patterns with formal Concept Analysis, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017,Melbourne, Australia, 19-25 August 2017. Melbourne : International Joint Conferences on Artificial Intelligence, 2017. P. 1425-1432. doi
Borchmann D., Hanika T., Obiedkov S. On the Usability of Probably Approximately Correct Implication Bases, in: Formal Concept Analysis: 14th International Conference, ICFCA 2017, Rennes, France, June 13-16, 2017, Proceedings Vol. 10308. Cham : Springer International Publishing, 2017. doi P. 72-88. doi
Makarov I., Konoplya O., Pavel Polyakov, Maxim Martynov, Zyuzin P., Gerasimova O., Bodishtianu Valeria. Adapting First-Person Shooter Video Game for Playing with Virtual Reality Headsets, in: Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2017, Marco Island, Florida, USA, May 22-24, 2017. AAAI Press 2017, ISBN 978-1-57735-787-2. Palo Alto : AAAI Press, 2017. P. 412-415.
Tyuryumina E., Neznanov A. On Consolidated Predictive Model of the Natural History of Breast Cancer: Primary Tumor and Secondary Metastases in Patients with Lymph Nodes Metastases, in: Proceedings of the 2017 International Conference on Digital Health. NY : Association for Computing Machinery (ACM), 2017. doi P. 60-66. doi
Tyuryumina E., Neznanov A. On Consolidated Predictive Model of the Natural History of Breast Cancer Considering Primary Tumor and Primary Distant Metastases Growth, in: 2017 IEEE International Conference on Healthcare Informatics. IEEE Computer Society, 2017. doi P. 484-489. doi
Galitsky B., Ilvovsky D. Chatbot with a Discourse Structure-Driven Dialogue Management, in: Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2017. P. 87-90.
Babin M. A., Kuznetsov S. Dualization in lattices given by ordered sets of irreducibles // Theoretical Computer Science. 2017. Vol. Volume 658, Part B. No. 7 January. P. 316-326. doi
Ignatov D. I., Semenov A., Комиссарова Д. В., Gnatyshak D. V. Multimodal Clustering for Community Detection, in: Formal Concept Analysis of Social Networks / Ed. by R. Missaoui, S. Kuznetsov, S. Obiedkov. Springer, 2017. doi Ch. 4. P. 59-96. doi
Shishkova A., Chernyak E. L. Annotated Suffix Tree Method for German Compound Splitting, in: CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Moscow, Russia, April 26, 2016 / Ed. by E. L. Chernyak, D. Ilvovsky, D. Skorinkin, A. Vybornova. Vol. 1886. Aachen : CEUR Workshop Proceedings, 2017. P. 42-47.
Chernyak E. L. Comparison of String Similarity Measures for Obscenity Filtering, in: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Stroudsburg, PA : The Association for Computational Linguistics, 2017. P. 97-101.
Пономарева М. А., Milintsevich K., Chernyak E. L., Starostin A. Automated Word Stress Detection in Russian, in: Proceedings of the First Workshop on Subword and Character Level Models in NLP. Stroudsburg, PA : Association for Computational Linguistics, 2017. P. 31-35.
Makhalova T., Kuznetsov S. On Overfitting of Classifiers Making a Lattice, in: Formal Concept Analysis: 14th International Conference, ICFCA 2017, Rennes, France, June 13-16, 2017, Proceedings Vol. 10308. Cham : Springer International Publishing, 2017. doi P. 184-197.
Maksimenkova O. V., Neznanov A., Skryabin M. On MOOCs Quality Estimation : a Case of Modern Nonparametric Superiority and Noninferiority Statistical Tests, in: eLearning Stakeholders and Researchers Summit 2017. Материалы международной конференции / Отв. ред.: Е. Ю. Кулик, У. Кускин. М. : Национальный исследовательский университет "Высшая школа экономики", 2017. doi P. 165-174.
Egurnov D., Ignatov D. I., MEPHU NGUIFO E. On Containment of Triclusters Collections Generated by Quantified Box Operators, in: 23rd International Symposium on Methodologies for Intelligent Systems - Proceedings. Birkhauser/Springer, 2017. doi P. 573-579.
Ignatov D. I. On closure operators related to maximal tricliques in tripartite hypergraphs // Discrete Applied Mathematics. 2017. P. 1-28.
Papushina I. O., Maksimenkova O. V., Kolomiets A. Digital Educational Mind Maps: a Computer Supported Collaboratvie Learning Practice on Marketing Master Program, in: Advances in Intelligent Systems and Computing. Interactive Collaborative Learning Proceedings of the 19th ICL Conference Vol. 1. NY : Springer International Publishing, 2016. doi P. 17-30. doi
Korepanova N., Kuznetsov S. Pattern Structures for Risk Group Identification, in: Formal Concept Analysis for Knowledge Discovery. Proceedings of International Workshop on Formal Concept Analysis for Knowledge Discovery (FCA4KD 2017), Moscow, Russia, June 1, 2017. / Ed. by S. Kuznetsov, B. W. Watson. Vol. 1921. CEUR-WS.org, 2017. P. 13-21.
Kanovich M., Kuznetsov S., Scedrov A., Morrill G. A Polynomial-Time Algorithm for the Lambek Calculus with Brackets of Bounded Order, in: Second International Conference on Formal Structures for Computation and Deduction, FSCD 2017 Vol. 84: 2nd International Conference on Formal Structures for Computation and Deduction (FSCD 2017). , 2017. doi