• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Mining Data with Complex Structure and Semantic Technologies

Priority areas of development: mathematics
2016
The project has been carried out as part of the HSE Program of Fundamental Studies.

Goal of research:

The research aims at developing new mathematical models, algorithms and software tools for solving problems of data mining and knowledge discovery for data with complex structure including text mining, graph mining, machine learning algorithms in classification problems of complex objects, and others. The developed methods, algorithms and software tools will be applied in solution of practical tasks.

Thus, the object of the research consists in methods, algorithms and software tools of data mining and visualization, ontology modelling, automatic text processing, etc. The subject of the research is the features of methods and algorithms, like scope of application, precision and performance.

Methodology:

The research is based on methods of discrete mathematics, computer science, computational linguistics, software engineering. First, we consider fundamental mathematical models based on Formal Concept Analysis (FCA), clustering, machine learning, applied graph theory. Second, we use methods of automatic text processing and ontology modelling. Then we implement original methods and algorithms in various components of intelligent software. Such implementations can be tested in synthetic tasks and adopted in practical applications.

Empirical base of research:

For testing purpose we use synthetic data and widely used datasets from the open data sources like UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), MovieLens service (https://movielens.org), ImhoNet service (http://imhonet.ru), gathered social networks datasets, etc.

Results of research:

26 scientific papers with results of the research during December 2015 – November 2016. The main results are:

  1. New machine learning methods for lazy classification of objects with complex structure based on pattern structures.
  2. New collaborative filtering methods with aggregated similarity measure.
  3. New algorithm for text fragments classification and similarity analysis based on syntactic and discourse structures of fragments.
  4. Adoption of the original lazy classification and attribute exploration methods for tasks of clinical informatics, including treatment optimization in oncology.
  5. New models and algorithms for prediction of natural history of breast cancer.
  6. Deep research of learning analytics and educational data mining tasks and methods.
  7. Implementation of new complex data preprocessing subsystems of Formal Concept Research Analysis Research Toolbox (FCART) that targets natural text processing tasks.

The level of implementation, recommendations on implementation or outcomes of the implementation of the results

The field of application of the obtained results consists of a spectrum of disciplines, where analysis of datasets with complex structure is in high demand and inevitably requires participation of a domain expert (medical informatics, education, sociology, logistics, criminology etc.).

Effectiveness, efficiency and correctness of the proposed models and methods are supported by comparative studies, testing and practical adoption. The level of implementation varies for different methods and software tools. New theoretical results in FCA, machine learning and text processing underlie almost all semantic technologies. New functionality of FCART are actively used in electronic library analysis project.

The conducted research resulted in a synergy effect of several international collaborative projects of ISSA Lab and allowed to adopt models and methods of data analysis in practical tasks in conjunction with Federal Scientific and D. Rogachev Clinical Centre of Pediatric Hematology, Oncology and Immunology (Russia), LORIA and LIRIS (France), TU-Dresden (Germany).

Publications:


Бочаров А. А., Gnatyshak D. V., Ignatov D. I., Mirkin B., Shestakoff A. A Lattice-based Consensus Clustering Algorithm, in: CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop Proceedings / Ed. by M. Huchard, S. Kuznetsov. Vol. 1624. M. : Higher School of Economics, National Research University, 2016. P. 45-56.
Chernyak E. L., Ilvovsky D. Annotated suffix trees for text clustering, in: The 3d International Workshop on Concept Discovery in Unstructured Data (CDUD 2016). Proceedings of the Third Workshop on Concept Discovery in Unstructured Data co-located with the 13th International Conference on Concept Lattices and Their Applications (CLA 2016), Moscow, Russia, July 18, 2016. CEUR Workshop Proceedings Vol. 1625. Aachen : CEUR Workshop Proceedings, 2016. P. 25-31.
Parinov A., Neznanov A. Unified External Data Access Implementation in Formal Concept Analysis Research Toolbox, in: CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop Proceedings / Ed. by M. Huchard, S. Kuznetsov. Vol. 1624. M. : Higher School of Economics, National Research University, 2016. P. 285-296.
Galitsky B., Ilvovsky D., Chernyak E. L., Kuznetsov S. Style and Genre Classification by Means of Deep Textual Parsing, in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.) / Под общ. ред.: В. Селегей. Вып. 15. М. : Изд-во РГГУ, 2016. P. 171-181.
Natalia V. Korepanova, Sergei O. Kuznetsov. Pattern Structures for Treatment Optimization, in: CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop Proceedings / Ed. by M. Huchard, S. Kuznetsov. Vol. 1624. M. : Higher School of Economics, National Research University, 2016. P. 217-229.
Корепанова Н. В., Кузнецов С. О. Выбор терапии онкологического заболевания в подгруппах пациентов на основе анализа замкнутых описаний // В кн.: Пятнадцатая национальная конференция по искусственному интеллекту с международным участием КИИ-2016 (3-7 октября 2016г., г.Смоленск, Россия): Труды конференции Т. 1. Смоленск : Универсум, 2016. С. 352-359.
Родин И. В., Chernyak E. L., Dubov M., Mirkin B. Visualization of Dynamic Reference Graphs, in: Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing. Stroudsburg, PA : Association for Computational Linguistics, 2016. P. 34-38.
Kashnitsky Y., Kuznetsov S. Interval Pattern Concept Lattice as a Classifier Ensemble, in: Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at ECAI 2016) / Ed. by Sergei O. Kuznetsov, Napoli Amedeo, S. Rudolph. M. : , 2016. P. 105-112.
Ignatov D. I., Nikolenko S. I., Abaev T., Poelmans J. Online recommender system for radio station hosting based on information fusion and adaptive tag-aware profiling // Expert Systems with Applications. 2016. Vol. 55. P.  546-558. doi
Galitsky B., Ilvovsky D. Discovering disinformation: discourse-level approach, in: Пятнадцатая национальная конференция по искусственному интеллекту с международным участием КИИ-2016 (3-7 октября 2016г., г.Смоленск, Россия): Труды конференции Т. 1. Смоленск : Универсум, 2016. Ch. 2. P. 23-32.
Kashnitsky Y., Kuznetsov S. Global Optimization in Learning with Important Data: an FCA-Based Approach, in: CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop Proceedings / Ed. by M. Huchard, S. Kuznetsov. Vol. 1624. M. : Higher School of Economics, National Research University, 2016. Ch. 19. P. 189-202.
Neznanov A., Parinov A. Distributed Architecture of Data Analysis System based on Formal Concept Analysis Approach, in: Intelligent Distributed Computing IX. Springer, 2015. P. 265-271.
Greene G. J., Dunaiski M., Fischer B., Ilvovsky D., Kuznetsov S. Browsing publication data using tag clouds over concept lattices constructed by key-phrase extraction, in: RuZA 2015 Workshop. Proceedings of Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis (RuZA 2015). November 30 - December 5, 2015, Stellenbosch, South Africa / Ed. by S. Kuznetsov, B. W. Watson. Vol. 1552. Aachen : CEUR Workshop Proceedings, 2015. P. 10-22.
Galitsky B., Ilvovsky D. Выявление искаженной информации: подход с использованием дискурсивных связей, in: Пятнадцатая национальная конференция по искусственному интеллекту с международным участием КИИ-2016 (3-7 октября 2016г., г.Смоленск, Россия): Труды конференции Т. 1. Смоленск : Универсум, 2016. P. 23-32.
Kashnitsky Y. Lazy Learning of Succinct Classification Rules for Complex Structure Data, in: Supplementary Proceedings of the 5th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2016), Yekaterinburg, Russia, April 7-9, 2016. / Ed. by D. I. Ignatov. Vol. 1710. Aachen : CEUR Workshop Proceedings, 2016. Ch. 8. P. 73-84.
Neznanov A., Parinov A. Full-text Search in Intermediate Data Storage of FCART, in: RuZA 2015 Workshop. Proceedings of Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis (RuZA 2015). November 30 - December 5, 2015, Stellenbosch, South Africa / Ed. by S. Kuznetsov, B. W. Watson. Vol. 1552. Aachen : CEUR Workshop Proceedings, 2015.
Bernhard Ganter, Obiedkov S. Conceptual Exploration. Berlin, Heidelberg : Springer, 2016. doi
Бобриков В. В., Ненова Е. Н., Ignatov D. I. What is a Fair Value of Your Recommendation List?, in: Proceedings of the Third Workshop on Experimental Economics and Machine Learning (EEML 2016), Moscow, Russia, July 18, 2016 / Ed. by R. Tagiew, D. I. Ignatov, A. Hilbert, R. Delhibabu. Vol. 1627. Aachen : CEUR Workshop Proceedings, 2016. P. 1-12.
Kanovich M., Scedrov A., Kuznetsov S. Undecidability of the Lambek calculus with a relevant modality, in: The 21st Conference on Formal Grammar. Springer, 2016. P. 240-256. doi
Ilvovsky D., Chernyak E. L. Visualisation of Russian newspaper corpus by means of reference graphs, in: RuZA 2015 Workshop. Proceedings of Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis (RuZA 2015). November 30 - December 5, 2015, Stellenbosch, South Africa / Ed. by S. Kuznetsov, B. W. Watson. Vol. 1552. Aachen : CEUR Workshop Proceedings, 2015. P. 1-9.
Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing. Stroudsburg, PA : Association for Computational Linguistics, 2016.
Scedrov A., Barthe G., Fagerholm E., Fiore D., Schmidt B., Tibouchi M. Strongly-Optimal Structure Preserving Signatures from Type II Pairings: Synthesis and Lower Bounds // IET Information Security. 2016. P. 358-371. doi
Kanovich M., Kuznetsov S., Scedrov A. On Lambek’s Restriction in the Presence of Exponential Modalities, in: Symposium on Logical Foundations of Computer Science (LFCS 2016) Vol. 9537: Logical Foundations of Computer Science. Springer, 2016. doi P. 146-158. doi
Wohlgenannt G., Chernyak E. L., Ilvovsky D. Extracting social networks from literary text with word embedding tools, in: Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH). Osaka : , 2016. Ch. 4. P. 18-26.
Scedrov A., Kanovich M., Kirigin T. B., Nigam V., Talcott C. Timed Multiset Rewriting and the Verification of Time-Sensitive Distributed Systems, in: 14th International Conference on Formal Modelling and Analysis of Timed Systems (Formats 2016). , 2016. P. 228-244. doi
Buzmakov A. V., Egho E., Jay N., Kuznetsov S., Napoli A., Raissi C. On mining complex sequential data by means of FCA and pattern structures // International Journal of General Systems. 2016. Vol. 45. No. 2. P. 135-159. doi