• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
ФИО студента
Название работы
Руководитель
Факультет
Программа
Оценка
Год защиты
Семенова Наталья Александровна
Graph Embedding for Text Attributed Network
2019
Due to the ubiquity use of information technologies, networks or graphs are becoming increasingly useful to capture interactions between entities across various areas, such as social sciences, linguistics, recommender systems, biology and telecommunications. These networks possess a huge scope of data that make it hard to investigate in terms of feature extraction. Network representation models, also known as graph embeddings, has aroused a lot of research interest and become popular within network science community. For the first time of this research direction has appeared scientists supposed to use such algorithms for dimensionality reduction but subsequently they found out that these techniques can be applied to various tasks.

In this paper we focus on comparison of network representation methods which operates only on structural features or accompanied by auxiliary information such as text attributes in application to link prediction machine learning task, i.e predicting either missing edges that may appear in the future in an evolving network. We base our research on existing graph embedding techniques and several ways to represent text attributes (e.g. by using word representations). Research problem of our work is stated as an investigation of text attributes’ impact to graph embedding building and its evaluation on link prediction task. We try to understand is it efficient to use auxiliary text information of graph components (nodes or edges) as a ground of graph embedding. For our experiment we collected a dataset consisting all articles, conference papers, reviews, book chapters, editorials which was published by HSE authors and reviewed in Scopus (1970-present) accompanied by abstract, keywords, year of publishing, number of pages etc. On this dataset ground we created co-authorship network, where nodes are authors and edges represent the collaboration of these authors.

We approve structural and text attribute representation in link prediction task as binary classification to verify our hypothesis. We tested two classifiers: RandomForest and Xgboost and took standard quality measures such as Precision, Accuracy, F1-macro, F1-micro, Logloss and ROC AUC. Experiments have shown that embeddings of text attributes (abstracts) perform comparable result to structural embeddings by nod2vec on train and test sets. Moreover, we propose operator for attributes’ embeddings based on distance measures to obtain edge embeddings from nodes representations and tests have shown workability of this method.

Выпускные квалификационные работы (ВКР) в НИУ ВШЭ выполняют все студенты в соответствии с университетским Положением и Правилами, определенными каждой образовательной программой.

Аннотации всех ВКР в обязательном порядке публикуются в свободном доступе на корпоративном портале НИУ ВШЭ.

Полный текст ВКР размещается в свободном доступе на портале НИУ ВШЭ только при наличии согласия студента – автора (правообладателя) работы либо, в случае выполнения работы коллективом студентов, при наличии согласия всех соавторов (правообладателей) работы. ВКР после размещения на портале НИУ ВШЭ приобретает статус электронной публикации.

ВКР являются объектами авторских прав, на их использование распространяются ограничения, предусмотренные законодательством Российской Федерации об интеллектуальной собственности.

В случае использования ВКР, в том числе путем цитирования, указание имени автора и источника заимствования обязательно.

Расширенный поиск ВКР