Year of Graduation
Finding triclusters of similar values in triadic multi-valued contexts
School of Applied Mathematics and Information Science
AbstactNowadays mining of triadic data becomes extremely popular. While there exists numerous methods for clustering data in binary dyadic, binary triadic and multi-valued dyadic contexts, only some of them could be extended to multi-valued triadic case.In this paper the author aimed to develop and study several algorithms of tri-clustering numerical data in multivalued numerical contexts, based on the methods of ОАС-tri-clustering (Object, Attribute, Condition), compare their performance, apply to real datasets and try to interpret the results.Some already existing methods, that solve similar problems, were considered in this paper. In particular, OAC-tri-clustering, based on prime-operators, Conceptual Scaling and its implementation in the TriMax algorithm, and Interordinal Scaling. Due to various reasons all of them could not be directly used to solve the presented problem. Therefore author proposes two methods for mining tri-clusters of similar values in multi-valued triadic contexts: NOAC (Numerical OAC), which is an extension of prime-based OAC-tri-clustering to multi-valued case, and a classical K-Means clustering algorithm with a new method of computing the distances.The proposed algorithms were tested on two datasets. One is a set of computer-generated contexts, which include experiments on general performance, resistance to noise and data loss. Another is the GroupLens project 100k dataset which contains anonymized movie rating data collected from the MovieLens web site (http://movielens.org).The comparison was made in terms of density, variance and covering of studied entities by singular tri-clusters and the whole resulting set of tri-clusters. The time of the computing was included in the list of the monitored parameters as well. The experiments were conducted on algorithms with various sets of options to identify the optimal settings.As a result the NOAC algorithm showed superior qualities over the adjusted K-Means method.Key words: clustering, tri-clustering, ОАС-tri-clustering, numerical data, similar values, multi-valued context.