Year of Graduation
Learning Node Embeddings is Graphs
Mathematical Methods of Optimization and Stochastics
In this work we consider the problem of learning nodes representation in graphs for machine learning problems. The problem is considered as a universal method of processing graph data that does not require supervised approach. In the beginning of the work we describe the problem and its relevance. The approaches based on matrix factorization or random walks for solving this problem are briefly described next. After that, the idea of creating new algorithms is motivated and their concept is briefly described. One method is based on the structural loss function, the second is based on matrix factorization. After the introduction and problem formal formulation, we turn to the description of modern methods for further analysis and comparison with new ones. They are the matrix factorization methods: SVD, NMF, BigClam and random walk: DeepWalk, Node2vec. New methods are Sparse Gamma Model and structural loss density function. The properties desirable for algorithms for searching vector representations are given. We give a brief list of such properties below. The method must be universal for all problems on graphs. Scalable method. Method without parameters. The method does not use complex assumptions about the nature of the data. The method directly uses the graph structure. Automatic selection of the effective dimension of the attachment. All methods are considered by the presented criteria, comparative analysis is carried out. The structural density loss function satisfies all the above properties. The work concludes with experiments and conclusions. During the experiments, a model and real example are considered in the classification problem on graphs. The structural density loss function has shown itself at the same level as the best methods. Special attention is paid to the task of identifying intersecting communities, in which the Sparse Gamma Model has proved to be the best model.