Year of Graduation
Clustering User Profiles Based on Their Interests
Applied Mathematics and Information Science
This thesis studies the problem of organizing customers into groups which have similar shopping preferences. The main problem arising in the process is the manipulation of extremely sparse big data and implementation of the algorithms in a memory efficient way. The goals are to implement a hierarchical clustering algorithm using different distance functions in order to group users into clusters based on their interests in certain products; and to investigate distance functions in order to get more reliable clusters. The hierarchical clustering technique is chosen as the exact number of clusters, according to which our data will be divided, is unknown. So, this method provides the opportunity to examine the number of clusters visually and to analyze them. A small part of the results was curated manually to verify that the obtained clusters are valid and make sense. Also, the quality of implemented distance functions was investigated. While testing on provided data set, our algorithm clustered it in accordance with the real preferences.