• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
For visually-impairedUser profile (HSE staff only)Search

Explanation-oriented Methods of  Data Analysis for Semantically Rich Data and Their Applications

Priority areas of development: mathematics
2017

Goal of research:

The research aims at developing new mathematical models, algorithms and software tools for solving problems of data mining and knowledge discovery for data with complex structure including text mining, graph mining, machine learning algorithms in classification problems of complex objects, and others. The developed methods, algorithms and software tools will be applied in solution of practical tasks.

Thus, the object of the research consists in methods, algorithms and software tools of data mining and visualization, ontology modelling, automatic text processing, etc. The subject of the research is the features of methods and algorithms, like scope of application, precision and performance, but with special interest in interpretability (explainability).

Methodology:

The research is based on methods of discrete mathematics, computer science, computational linguistics, software engineering. First, we consider fundamental mathematical models based on Formal Concept Analysis (FCA), clustering, machine learning, applied graph theory. Second, we use methods of automatic text processing and ontology modelling. Then we implement original methods and algorithms in various components of intelligent software systems. Such implementations can be tested in synthetic tasks and adopted in practical applications.

Empirical base of research:

For testing purpose, we use synthetic data, gathered from electronic scientific libraries, social media services, collection of healthcare records, NRU HSE students’ works, the open data repositories like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), etc.

Results of research:

20 scientific papers with results of the research during December 2016 – November 2017. The main results are:

1. New algorithms for text fragments classification and similarity analysis based on syntactic and discourse structures of fragments.

2. Advances in relevance analysis of texts based on annotated suffix trees.

3. Advances in original lazy classification methods applied to clinical informatics tasks including oncology therapy optimization.

4. Advances in models for prediction of natural history of breast cancer.

5. Implementation of new approaches to interpretation and analysis of frequent closed sets of attributes.

6. Deep research of educational data mining in adaptive learning.

7. New methods of automated assessment of mind maps.

8. New strategies and technologies for the deployment of container nodes to gather data from external data sources.

The level of implementation, recommendations on implementation or outcomes of the implementation of the results

The field of application of the obtained results consists of a spectrum of disciplines, where analysis of datasets with complex structure is in high demand and inevitably requires participation of a domain expert (medical informatics, education, sociology, logistics, criminology etc.).

Effectiveness, efficiency and correctness of the proposed models and methods are supported by comparative studies, testing and practical adoption. The level of implementation varies for different methods and software tools. New theoretical results in FCA, machine learning and text processing underlie almost all modern semantic technologies. Practical implementation of the proposed methods of data analysis was considered to be well explainable by domain experts.

The conducted research resulted in a synergy effect of several international collaborative projects of ISSA Lab and allowed to adopt models and methods of data analysis in practical tasks in conjunction with Gemotest Laboratory, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology (Russia), LORIA and LIRIS (France), TU-Dresden (Germany), etc.

Publications:


Papushina I. O., Maksimenkova O. V., Kolomiets A. Digital Educational Mind Maps: a Computer Supported Collaboratvie Learning Practice on Marketing Master Program, in: Advances in Intelligent Systems and Computing. Interactive Collaborative Learning Proceedings of the 19th ICL Conference. New York : Springer International Publishing, 2016. С. 17-30. 
Babin M. A., Kuznetsov S. Dualization in lattices given by ordered sets of irreducibles // Theoretical Computer Science. 2017. Vol. Volume 658, Part B. No. 7 January. P. 316-326. doi
Kanovich M., Kirigin T. B., Nigam V., Scedrov A., Talcott C., Perovic R. A rewriting framework and logic for activities subject to regulations // Mathematical Structures in Computer Science. 2017. Vol. 27. No. 3. P. 332-375. doi
Kanovich M., Scedrov A., Kirigin T. B., Nigam V., Talcott C. Time, computational complexity, and probability in the analysis of distance-bounding protocols // Journal of Computer Security. 2017. Vol. 25. No. 6. P. 585-630. doi
Egurnov D., Ignatov D. I., MEPHU N. E. On Containment of Triclusters Collections Generated by Quantified Box Operators, in: 23rd International Symposium on Methodologies for Intelligent Systems - Proceedings.: Birkhauser/ Springer, 2017. С. 573-579. 
Ignatov D. I., Yurov M. Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress, in: Rough Sets - International Joint Conference, IJCRS 2017, Olsztyn, Poland, July 3-7, 2017, Proceedings, Part II..: Springer, 2017. С. 558-569. 
Galitsky B., Ilvovsky D. Chatbot with a Discourse Structure-Driven Dialogue Management, in: Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics.: Association for Computational Linguistics, 2017. С. 87-90. 
Kanovich M., Kuznetsov S., Morrill G., Scedrov A. A Polynomial-Time Algorithm for the Lambek Calculus with Brackets of Bounded Order, in: Second International Conference on Formal Structures for Computation and Deduction, FSCD 2017., 2017. С. 22:1-22:17. 
Kanovich M., Brotherston J., Gorogiannis N. Biabduction (and Related Problems) in Array Separation Logic, in: 26th International Conference on Automated Deduction – CADE 26.: Springer, 2017. С. 472-490. 
Kanovich M., Scedrov A., Kuznetsov S. Undecidability of the Lambek Calculus with Subexponential and Bracket Modalities, in: 21st International Symposium, Fundamentals of Computation Theory 2017, FCT 2017.: Springer Berlin Heidelberg, 2017. С. 326-340. 
Korepanova N., Kuznetsov S. Pattern Structures for Risk Group Identification, in: Formal Concept Analysis for Knowledge Discovery. Proceedings of International Workshop on Formal Concept Analysis for Knowledge Discovery (FCA4KD 2017), Moscow, Russia, June 1, 2017..: CEUR-WS.org, 2017. С. 13-21. 
Belfodil A., Kuznetsov S., Robardet C., Kaytoue M. Mining convex polygon patterns with formal Concept Analysis, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017,Melbourne, Australia, 19-25 August 2017. Melbourne : International Joint Conferences on Artificial Intelligence, 2017. С. 1425-1432. 
Makarov I., Konoplya O., Pavel P., Maxim M., Zyuzin P., Gerasimova O., Bodishtianu V. Adapting First-Person Shooter Video Game for Playing with Virtual Reality Headsets, in: Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2017, Marco Island, Florida, USA, May 22-24, 2017. AAAI Press 2017, ISBN 978-1-57735-787-2. Palo Alto : AAAI Press, 2017. С. 412-415. 
Makarov I., Bulanov O., Zhukov L. E. Co-author Recommender System, in: Models, Algorithms, and Technologies for Network Analysis. Springer Proceedings in Mathematics & Statistics.: Springer, 2017. С. 251-257. 
Maksimenkova O. V., Neznanov A., Skryabin M. On MOOCs Quality Estimation : a Case of Modern Nonparametric Superiority and Noninferiority Statistical Tests, in: eLearning Stakeholders and Researchers Summit 2017. Материалы международной конференции. Moscow : Национальный исследовательский университет "Высшая школа экономики", 2017. С. 165-174. 
Makhalova T., Kuznetsov S. On Overfitting of Classifiers Making a Lattice, in: Formal Concept Analysis: 14th International Conference, ICFCA 2017, Rennes, France, June 13-16, 2017, Proceedings. Cham : Springer International Publishing, 2017. С. 184-197. 
Borchmann D., Hanika T., Obiedkov S. On the Usability of Probably Approximately Correct Implication Bases, in: Formal Concept Analysis: 14th International Conference, ICFCA 2017, Rennes, France, June 13-16, 2017, Proceedings. Cham : Springer International Publishing, 2017. С. 72-88. 
Пономарева М. А., Milintsevich K., Artemova E., Starostin A. Automated Word Stress Detection in Russian, in: Proceedings of the First Workshop on Subword and Character Level Models in NLP. Stroudsburg, PA : Association for Computational Linguistics, 2017. С. 31-35. 
Ella Y. T., Neznanov A. On Consolidated Predictive Model of the Natural History of Breast Cancer Considering Primary Tumor and Primary Distant Metastases Growth, in: 2017 IEEE International Conference on Healthcare Informatics.: IEEE Computer Society, 2017. С. 484-489. 
Ella Y. T., Neznanov A. On Consolidated Predictive Model of the Natural History of Breast Cancer: Primary Tumor and Secondary Metastases in Patients with Lymph Nodes Metastases, in: Proceedings of the 2017 International Conference on Digital Health. New York : Association for Computing Machinery (ACM), 2017. С. 60-66. 
Shishkova A., Artemova E. Annotated Suffix Tree Method for German Compound Splitting, in: CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Moscow, Russia, April 26, 2016. Aachen : CEUR Workshop Proceedings, 2017. С. 42-47. 
Zakharyaschev M., BRESOLIN D., KURUCZ AGI, MUÑOZ-VELASCO E., RYZHIKOV V., SCIAVICCO G. Horn fragments of the Halpern-Shoham interval temporal logic, in: ACM Transactions on Computational Logic (TOCL). New York : ACM, 2017. С. 1-39. 
Ignatov D. I., Semenov A., Комиссарова Д. В., Gnatyshak D. V. Multimodal Clustering for Community Detection, in: Formal Concept Analysis of Social Networks.: Springer, 2017. С. 59-96. 
Masyutin A., Kashnitsky Y. Query-Based Versus Tree-Based Classification: Application to Banking Data, in: Foundations of Intelligent Systems. Warszawa : Springer International Publishing, 2017. С. 664-673. 
Artemova E. Comparison of String Similarity Measures for Obscenity Filtering, in: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Stroudsburg, PA : The Association for Computational Linguistics, 2017. С. 97-101. 
Alexey N. New Reality in Clinical Informatics and Explanation-oriented Methods of Data Analysis, in: Proceedings of the first Workshop on Data Analysis in Medicine (WDAM-2017).: EasyChair, 2018. С. 43-47.