• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Student
Title
Supervisor
Faculty
Educational Programme
Final Grade
Year of Graduation
Dmitrij Egurnov
Finding triclusters of similar values in triadic multi-valued contexts
School of Applied Mathematics and Information Science
Bachelor’s programme
2014
AbstactNowadays mining of triadic data becomes extremely popular. While there exists numerous methods for clustering data in binary dyadic, binary triadic and multi-valued dyadic contexts, only some of them could be extended to multi-valued triadic case.In this paper the author aimed to develop and study several algorithms of tri-clustering numerical data in multivalued numerical contexts, based on the methods of ОАС-tri-clustering (Object, Attribute, Condition), compare their performance, apply to real datasets and try to interpret the results.Some already existing methods, that solve similar problems, were considered in this paper. In particular, OAC-tri-clustering, based on prime-operators, Conceptual Scaling and its implementation in the TriMax algorithm, and Interordinal Scaling. Due to various reasons all of them could not be directly used to solve the presented problem. Therefore author proposes two methods for mining tri-clusters of similar values in multi-valued triadic contexts: NOAC (Numerical OAC), which is an extension of prime-based OAC-tri-clustering to multi-valued case, and a classical K-Means clustering algorithm with a new method of computing the distances.The proposed algorithms were tested on two datasets. One is a set of computer-generated contexts, which include experiments on general performance, resistance to noise and data loss. Another is the GroupLens project 100k dataset which contains anonymized movie rating data collected from the MovieLens web site (http://movielens.org).The comparison was made in terms of density, variance and covering of studied entities by singular tri-clusters and the whole resulting set of tri-clusters. The time of the computing was included in the list of the monitored parameters as well. The experiments were conducted on algorithms with various sets of options to identify the optimal settings.As a result the NOAC algorithm showed superior qualities over the adjusted K-Means method.Key words: clustering, tri-clustering, ОАС-tri-clustering, numerical data, similar values, multi-valued context.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses