Ordered Sets in Data Analysis

Master 2021/2022

Type: Elective course (Data Science)

Area of studies: Applied Mathematics and Informatics

Delivered by: School of Data Analysis and Artificial Intelligence

Where: Faculty of Computer Science

When: 1 year, 1, 2 module

Mode of studies: offline

Open to: students of one campus

Instructors: Sergei Kuznetsov, Fedor Vladimirovich Strok

Master’s programme: Data Science

Language: English

ECTS credits: 5

Contact hours: 52

Full Syllabus Ask Question

Abstract

“Ordered sets for data analysis” is a first-semester master course within “Data Science” master program. Order relations are ubiquitous, we meet them when we consider numbers, Boolean algebras, partitions, multisets, graphs, logical formulas, and many other mathematical entities. On the one hand, order relations are used for representing data and knowledge, on the other hand they serve as important tools for describing models and methods of data analysis, such as decision trees, random forests, version spaces, association rules, etc. We start from the basic notions of binary relations, their representations with matrices and graphs, consider properties of relations and operations on them, giving examples on problems of data analysis. We come then to the notions of order, semilattice, and lattice. Very important data models like numbers, vectors of numbers, set algebras, partitions make lattices. We consider an important type of lattices, called concept lattices, which constitute the main subject of Formal Concept Analysis (FCA), a modern domain of applied lattice theory. Concept lattices provide very useful tools for data analysis and pattern mining. On the one hand, formal concepts, the elements of a concept lattice, describe sets of objects sharing sets of common attributes, thus presenting a nice model of (bi)clustering. Ordered sets of concepts give a hierarchy of classes of a subject domain, which makes a backbone of domain ontology. On the other hand, concept lattices represent dependencies on attributes, both precise (called implications), and imprecise, known in data mining as association rules. We consider most important algorithms and algorithmic problems related to applications of FCA. Concept lattices allow us to describe some fundamental models of data analysis and pattern mining such as decision trees, random forests, version spaces, and concept-based hypotheses. Since a serious limitation of many methods of pattern mining is computational complexity, complexity analysis makes an important leitmotif of the course. Throughout the course we consider applications of the introduced notions in various applied domains, such as text analysis, medicine, pharmacology, etc. Homework consists of both theoretical problems and practical problems in designing particular.

Learning Objectives

To help students to synthesize knowledge from undergraduate courses on discrete mathematics, algorithmic theory, data analysis to get a stereoscopic picture of the foundations of pattern mining.
Providing students with some important theoretical knowledge and practical skills that would help them to see the relationship of various methods of pattern mining and knowledge discovery.

Expected Learning Outcomes

Be capable of realizing and testing basic learning and classification model.
Be familiar with Association rules in Data Mining, their support and confidence.
Be familiar with complexity classes, (polynomial) reduction between problems, problem hardness and completeness.
Be familiar with concept-based (JSM) hypotheses.
Be familiar with implications, functional dependencies, Armstrong rules, and implication bases.
Be familiar with semilattices and lattices, and know their properties.
Be familiar with version space, minimal generators and minimal hypotheses.
Develop mathematical models based on order theory and Galois connections.
Know basic notions and terminology of algorithmic complexity.
Know basic notions and terminology of order theory and symbolic data analysis.
Know basic notions and terminology of relations, graphs, and matrices as data structures.
Know differense between Decision trees and hypothesis.
Understand basic notions of dependencies in data.
Understand clustering and biclustering, and usage of Formal concepts as biclusters.
Understand Formal Concept Analysis (FCA) and its Basic theorem.
Understand fundamental principles of learning and classification through Concept Lattices.
Understand fundamental principles of symbolic pattern mining.
Understand fundamental principles of working with complex data using pattern structures and pattern setups.
Understand Interestingness measures of patterns.
Understand the main algorithms for Pattern Mining with concept lattices.

Course Contents

Introduction. Relations, Graphs, and Matrices as Data Structures
Algorithmic Complexity for Data Science
Orders, Their Graphs, and Diagrams for Data and Knowledge Representation
Semilattices and Lattices
Galois Connections and Formal Concept Analysis (FCA)
Dependencies in Data: Implications and Functional Dependencies
Association Rules and Concept Lattices for Data Mining
Algorithms for Pattern Mining with Concept Lattices
Biclustering and Interestingness Measures of Patterns
Models of Learning and Classification Through Concept Lattices
Working with Complex Data: Pattern Structures and Pattern Setups

Assessment Elements

Homework
Research project
Exam
Students have to demonstrate knowledge of theory, but the most of tasks would evaluate their ability to solve practical examples, present straight operation, and recognition skills to solve them.

Interim Assessment

2021/2022 1st module
0.48*Homework + 0.32*Research Project + 0.2*Exam
2021/2022 2nd module
0.48*Homework + 0.32*Research Project + 0.2*Exam

Bibliography

Recommended Core Bibliography

Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. Introduction to Algorithms (3rd edition). – MIT Press, 2009. – 1292 pp.
Kuznetsov, S. O. Fitting pattern structures to knowledge discovery in big data // International conference on formal concept analysis. – Springer, Berlin, Heidelberg, 2013. – PP. 254-266.

Recommended Additional Bibliography

Kuznetsov, S. O. Pattern structures for analyzing complex data // International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing. – Springer, Berlin, Heidelberg, 2009. – P. 33-44.
Kuznetsov, S. O. Scalable knowledge discovery in complex data with pattern structures // International Conference on Pattern Recognition and Machine Intelligence. – Springer, Berlin, Heidelberg, 2013. – P. 30-39.

Course Syllabus