圖神經網絡的簡要介紹（基礎知識，DeepWalk和GraphSage）

時間 2020-07-25 標籤圖神經網絡簡要介紹基礎知識 deepwalk graphsage

A Gentle Introduction to Graph Neural Networks (Basics, DeepWalk, and GraphSage)

圖神經網絡的簡要介紹（基礎知識，DeepWalk和GraphSage）

Recently, Graph Neural Network (GNN) has gained increasing popularity in various domains, including social network, knowledge graph, recommender system, and even life science. The power of GNN in modeling the dependencies between nodes in a graph enables the breakthrough in the research area related to graph analysis. This article aims to introduce the basics of Graph Neural Network and two more advanced algorithms, DeepWalk and GraphSage.
最近，圖形神經網絡（GNN）在各類領域愈來愈受歡迎，包括社交網絡，知識圖，推薦系統，甚至生命科學。 GNN在對圖中節點之間的依賴關係建模方面的強大功能使得與圖分析相關的研究領域取得了突破。本文旨在介紹圖形神經網絡的基礎知識和兩種更高級的算法，DeepWalk和GraphSage。node

Graph

Before we get into GNN, let’s first understand what is Graph. In Computer Science, a graph is a data structure consisting of two components, vertices and edges. A graph G can be well described by the set of vertices V and edges E it contains.
在咱們進入GNN以前，讓咱們先了解一下Graph是什麼。在計算機科學中，圖形是由兩個組成部分組成的數據結構，頂點和邊緣。圖G能夠經過頂點集V和它包含的邊E來很好地描述。
web

Edges can be either directed or undirected, depending on whether there exist directional dependencies between vertices.
根據頂點之間是否存在方向依賴性，邊緣能夠是定向的或不定向的。
算法

The vertices are often called nodes. In this article, these two terms are interchangeable.
頂點一般稱爲節點。在本文中，這兩個術語是能夠互換的。網絡

Graph Neural Network

Graph Neural Network is a type of Neural Network which directly operates on the Graph structure. A typical application of GNN is node classification. Essentially, every node in the graph is associated with a label, and we want to predict the label of the nodes without ground-truth. This section will illustrate the algorithm described in the paper, the first proposal of GNN and thus often regarded as the original GNN.數據結構

In the node classification problem setup, each node v is characterized by its feature x_v and associated with a ground-truth label t_v. Given a partially labeled graph G, the goal is to leverage these labeled nodes to predict the labels of the unlabeled. It learns to represent each node with a d dimensional vector (state) h_v which contains the information of its neighborhood. Specifically,
圖神經網絡是一種直接在圖形結構上運行的神經網絡。 GNN的典型應用是節點分類。本質上，圖中的每一個節點都與一個標籤相關聯，咱們但願預測沒有地面實況的節點的標籤。本節將說明本文中描述的算法，GNN的第一個提議，所以一般被視爲原始GNN。app

在節點分類問題設置中，每一個節點v的特徵在於其特徵x_v而且與地面實況標籤t_v相關聯。給定部分標記的圖G，目標是利用這些標記的節點來預測未標記的標記。它學會用d維向量（狀態）h_v表示每一個節點，其中包含其鄰域的信息。特別，
dom

where x_co[v] denotes the features of the edges connecting with v, h_ne[v] denotes the embedding of the neighboring nodes of v, and x_ne[v] denotes the features of the neighboring nodes of v. The function f is the transition function that projects these inputs onto a d-dimensional space. Since we are seeking a unique solution for h_v, we can apply Banach fixed point theorem and rewrite the above equation as an iteratively update process. Such operation is often referred to as message passing or neighborhood aggregation.
其中x_co [v]表示與v鏈接的邊的特徵，h_ne [v]表示嵌入v的相鄰節點，x_ne [v]表示v的相鄰節點的特徵。函數f是轉換將這些輸入投影到d維空間的函數。因爲咱們正在尋找h_v的惟一解，咱們能夠應用Banach不動點定理並將上述等式重寫爲迭代更新過程。這種操做一般被稱爲消息傳遞或鄰居聚合。ide

H and X denote the concatenation of all the h and x, respectively.svg

The output of the GNN is computed by passing the state h_v as well as the feature x_v to an output function g.
H和X分別表示全部h和x的串聯。函數

經過將狀態h_v以及特徵x_v傳遞給輸出函數g來計算GNN的輸出。

Both f and g here can be interpreted as feed-forward fully-connected Neural Networks. The L1 loss can be straightforwardly formulated as the following:
這裏的f和g均可以解釋爲前饋全鏈接神經網絡。 L1損失能夠直接表述以下：

which can be optimized via gradient descent.

However, there are three main limitations with this original proposal of GNN pointed out by this paper:

If the assumption of 「fixed point」 is relaxed, it is possible to leverage Multi-layer Perceptron to learn a more stable representation, and removing the iterative update process. This is because, in the original proposal, different iterations use the same parameters of the transition function f, while the different parameters in different layers of MLP allow for hierarchical feature extraction.
It cannot process edge information (e.g. different edges in a knowledge graph may indicate different relationship between nodes)
Fixed point can discourage the diversification of node distribution, and thus may not be suitable for learning to represent nodes.
Several variants of GNN have been proposed to address the above issue. However, they are not covered as they are not the focus in this post.
能夠經過梯度降低優化。

可是，本文指出的GNN原始提案有三個主要限制：

若是放寬了「固定點」的假設，則能夠利用多層感知器來學習更穩定的表示，並刪除迭代更新過程。這是由於，在原始提議中，不一樣的迭代使用轉移函數f的相同參數，而不一樣MLP層中的不一樣參數容許分層特徵提取。
它不能處理邊緣信息（例如，知識圖中的不一樣邊緣可能表示節點之間的不一樣關係）
固定點能夠阻止節點分佈的多樣化，所以可能不適合學習表示節點。
已經提出了幾種GNN變體來解決上述問題。可是，他們沒有被涵蓋，由於他們不是這篇文章的重點。

DeepWalk

DeepWalk is the first algorithm proposing node embedding learned in an unsupervised manner. It highly resembles word embedding in terms of the training process. The motivation is that the distribution of both nodes in a graph and words in a corpus follow a power law as shown in the following figure:
DeepWalk是第一個提出以無人監督的方式學習的節點嵌入的算法。它在培訓過程當中很是相似於字嵌入。動機是圖表中的兩個節點和語料庫中的單詞的分佈遵循冪律，以下圖所示：

The algorithm contains two steps:

1 Perform random walks on nodes in a graph to generate node sequences
2 Run skip-gram to learn the embedding of each node based on the node sequences generated in step 1

At each time step of the random walk, the next node is sampled uniformly from the neighbor of the previous node. Each sequence is then truncated into sub-sequences of length 2|w| + 1, where w denotes the window size in skip-gram. If you are not familiar with skip-gram, my previous blog post shall brief you how it works.

In this paper, hierarchical softmax is applied to address the costly computation of softmax due to the huge number of nodes. To compute the softmax value of each of the individual output element, we must compute all the e^xk for all the element k.
該算法包含兩個步驟：

1在圖形中的節點上執行隨機遍歷以生成節點序列
2運行skip-gram，根據步驟1中生成的節點序列學習每一個節點的嵌入

在隨機遊走的每一個時間步驟，從前一節點的鄰居統一採樣下一個節點。而後將每一個序列截短爲長度爲2 | w |的子序列 + 1，其中w表示skip-gram中的窗口大小。若是您不熟悉skip-gram，我以前的博客文章將向您介紹它的工做原理。

在本文中，分層softmax用於解決因爲節點數量龐大而致使的softmax計算成本高昂的問題。爲了計算每一個單獨輸出元素的softmax值，咱們必須計算全部元素k的全部e ^ xk。

Therefore, the computation time is O(|V|) for the original softmax, where V denotes the set of vertices in the graph.

Hierarchical softmax utilizes a binary tree to deal with the problem. In this binary tree, all the leaves (v1, v2, … v8 in the above graph) are the vertices in the graph. In each of the inner node, there is a binary classifier to decide which path to choose. To compute the probability of a given vertex v_k, one simply compute the probability of each of the sub-path along the path from the root node to the leave v_k. Since the probability of each node’ children sums to 1, the property that the sum of the probability of all the vertices equals 1 still holds in the hierarchical softmax. The computation time for an element is now reduced to O(log|V|) as the longest path for a binary tree is bounded by O(log(n)) where n is the number of leaves.
所以，原始softmax的計算時間爲O（| V |），其中V表示圖中的頂點集。

分層softmax利用二叉樹來處理問題。在這個二叉樹中，全部葉子（上圖中的v1，v2，… v8）都是圖中的頂點。在每一個內部節點中，有一個二元分類器來決定選擇哪條路徑。爲了計算給定頂點v_k的機率，能夠簡單地計算沿着從根節點到離開v_k的路徑中的每一個子路徑的機率。因爲每一個節點的子機率爲1，所以全部頂點的機率之和等於1的特性仍然保持在分層softmax中。如今，元素的計算時間減小到O（log | V |），由於二叉樹的最長路徑由O（log（n））限定，其中n是葉子的數量。

After a DeepWalk GNN is trained, the model has learned a good representation of each node as shown in the following figure. Different colors indicate different labels in the input graph. We can see that in the output graph (embedding with 2 dimensions), nodes having the same labels are clustered together, while most nodes with different labels are separated properly.
在訓練DeepWalk GNN以後，模型已經學習了每一個節點的良好表示，以下圖所示。不一樣的顏色表示輸入圖中的不一樣標籤。咱們能夠看到，在輸出圖（嵌入2維）中，具備相同標籤的節點彙集在一塊兒，而具備不一樣標籤的大多數節點被正確分開。

However, the main issue with DeepWalk is that it lacks the ability of generalization. Whenever a new node comes in, it has to re-train the model in order to represent this node (transductive). Thus, such GNN is not suitable for dynamic graphs where the nodes in the graphs are ever-changing.
然而，DeepWalk的主要問題是它缺少泛化能力。每當有新節點進入時，它必須從新訓練模型以表示該節點（轉換）。所以，這種GNN不適用於圖中節點不斷變化的動態圖。

GraphSage

GraphSage provides a solution to address the aforementioned problem, learning the embedding for each node in an inductive way. Specifically, each node is represented by the aggregation of its neighborhood. Thus, even if a new node unseen during training time appears in the graph, it can still be properly represented by its neighboring nodes. Below shows the algorithm of GraphSage.
GraphSage提供解決上述問題的解決方案，以概括方式學習每一個節點的嵌入。具體而言，每一個節點由其鄰域的聚合表示。所以，即便在訓練時間期間看不到的新節點出如今圖中，它仍然能夠由其相鄰節點正確地表示。下面顯示了GraphSage的算法。

The outer loop indicates the number of update iteration, while h^k_v denotes the latent vector of node v at update iteration k. At each update iteration, h^k_v is updated based on an aggregation function, the latent vectors of v and v’s neighborhood in the previous iteration, and a weight matrix W^k. The paper proposed three aggregation function:
外環表示更新迭代次數，而h ^ k_v表示更新迭代k時節點v的潛在向量。在每次更新迭代時，基於聚合函數，前一次迭代中v和v鄰域的潛在向量以及權重矩陣W ^ k來更新h ^ k_v。本文提出了三種聚合函數：

1. Mean aggregator:

The mean aggregator takes the average of the latent vectors of a node and all its neighborhood.
平均聚合器獲取節點及其全部鄰域的潛在向量的平均值。

Compared with the original equation, it removes the concatenation operation at line 5 in the above pseudo code. This operation can be viewed as a 「skip-connection」, which later in the paper proved to largely improve the performance of the model.
與原始方程相比，它刪除了上述僞代碼中第5行的鏈接操做。此操做能夠被視爲「跳過鏈接」，本文稍後將證實能夠在很大程度上提升模型的性能。

2. LSTM aggregator:

Since the nodes in the graph don’t have any order, they assign the order randomly by permuting these nodes.
因爲圖中的節點沒有任何順序，所以它們經過置換這些節點來隨機分配順序。

3. Pooling aggregator:

This operator performs an element-wise pooling function on the neighboring set. Below shows an example of max-pooling:
此運算符在相鄰集上執行逐元素池功能。下面顯示了最大池的示例：

, which can be replaced with mean-pooling or any other symmetric pooling function. It points out that pooling aggregator performs the best, while mean-pooling and max-pooling aggregator have similar performance. The paper uses max-pooling as the default aggregation function.

The loss function is defined as the following:
，能夠用平均池或任何其餘對稱池函數替換。它指出池聚合器執行最佳，而均值池和最大池聚合器具備類似的性能。本文使用max-pooling做爲默認聚合函數。

損失函數定義以下：

where u and v co-occur in a fixed-length random walk, while v_n are the negative samples that don’t co-occur with u. Such loss function encourages nodes closer to have similar embedding, while those far apart to be separated in the projected space. Via this approach, the nodes will gain more and more information about their neighborhoods.

GraphSage enables representable embedding to be generated for unseen nodes by aggregating its nearby nodes. It allows node embedding to be applied to domains involving dynamic graph, where the structure of the graph is ever-changing. Pinterest, for example, has adopted an extended version of GraphSage, PinSage, as the core of their content discovery system.
其中u和v共同出如今固定長度的隨機遊走中，而v_n是不與u共同出現的負樣本。這種損失函數鼓勵節點更接近具備相似的嵌入，而那些相距很遠的節點在投影空間中分離。經過這種方法，節點將得到愈來愈多關於其鄰域的信息。

GraphSage經過聚合其附近的節點，能夠爲看不見的節點生成可表示的嵌入。它容許將節點嵌入應用於涉及動態圖的域，其中圖的結構不斷變化。例如，Pinterest採用了GraphSage的擴展版本，PinSage，做爲其內容發現系統的核心。

Conclusion

You have learned the basics of Graph Neural Networks, DeepWalk, and GraphSage. The power of GNN in modeling complex graph structures is truly astonishing. In view of its effectiveness, I believe, in the near future, GNN will play an important role in AI’s development. If you like my article, don’t forget to follow me on Medium and Twitter, where I frequently shared the most advanced development of AI, ML, and DL. 您已經學習了圖形神經網絡，DeepWalk和GraphSage的基礎知識。 GNN在複雜圖形結構建模中的強大功能確實使人驚訝。鑑於其有效性，我相信，在不久的未來，GNN將在人工智能的發展中發揮重要做用。若是您喜歡個人文章，請不要忘記關注我，我常常分享AI，ML和DL的最高級開發文章與技術博客。