图网络论文综述A Survey on The Expressive Power of Graph Neural Netowrks_第1页
图网络论文综述A Survey on The Expressive Power of Graph Neural Netowrks_第2页
图网络论文综述A Survey on The Expressive Power of Graph Neural Netowrks_第3页
图网络论文综述A Survey on The Expressive Power of Graph Neural Netowrks_第4页
图网络论文综述A Survey on The Expressive Power of Graph Neural Netowrks_第5页
已阅读5页,还剩36页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

arXiv:2003.04078v2 cs.LG 15 Mar 2020 A Survey on The Expressive Power of Graph Neural Networks Ryoma Sato r.satoml.ist.i.kyoto-u.ac.jp Kyoto University / RIKEN AIP Abstract Graph neural networks (GNNs) are eff ective machine learning models for various graph learning problems. Despite their empirical successes, the theoretical limitations of GNNs have been revealed recently. Conse- quently, many GNN models have been proposed to overcome these limi- tations. In this survey, we provide a comprehensive overview of the ex- pressive power of GNNs and provably powerful variants of GNNs. 1Introduction Graph neural networks (GNNs) (Gori et al., 2005; Scarselli et al., 2009) are eff ective machine learning models for various graph-related problems, includ- ing chemo-informatics (Gilmer et al., 2017; Zhang et al., 2018), recommender systems (Ying et al., 2018; Wang et al., 2019a,b; Fan et al., 2019; Gong et al., 2019), question-answering systems (Schlichtkrull et al., 2018; Park et al., 2019), and combinatorial problems (Khalil et al., 2017; Li et al., 2018; Gasse et al., 2019).Comprehensive surveys on GNNs were provided by Hamilton et al. (2017a), Zhou et al. (2018), and Wu et al. (2019). Despite GNNs empirical successes in various fi elds, Xu et al. (2019) and Morris et al. (2019) demonstrated that GNNs cannot distinguish some pairs of graphs. This indicates that GNNs cannot correctly classify these graphs with any parameters unless the labels of these graphs are the same.This result contrasts with the universal approximation power of multi layer perceptrons (Cybenko, 1989; Hornik et al., 1989; Hornik, 1991). Furthermore, Sato et al. (2019a) showed that GNNs are at most as powerful as distributed local algo- rithms (Angluin, 1980; Suomela, 2013).Thus there are many combinatorial problems that GNNs cannot solve other than the graph isomorphism problem. Consequently, various provably powerful GNN models have been proposed to overcome the limitations of GNNs. This survey provides an extensive overview of the expressive power of GNNs and various GNN models to overcome these limitations. Unlike other surveys on GNNs, which introduce architectures and applications, this survey focuses 1 Table 1: Notations. NotationsDescriptions .A set. . A multiset. nThe set 1,2,.,n. a,a,AA scalar, vector, and matrix. AThe transpose of A. G = (V,E)A graph. G = (V,E,X)A graph with attributes. VThe set of nodes in a graph. EThe set of edges in a graph. nThe number of nodes. mThe number of edges. N(v)The set of the neighboring nodes of node v. deg(v)The degree of node v. X = x1,.,xn RndThe feature matrix. h(l) v RdlThe embedding of node v in the l-th layer (Eq. 2). f(l) aggregate The aggregation function in the l-th layer (Eq. 1). f(l) update The update function in the l-th layer (Eq. 2). freadoutThe readout function (Eq. 3). The maximum degree of input graphs. b(k)The k-th bell number. H(k)The k-th harmonic number H(k) = 1 1 + 1 2 + + 1 k. on the theoretical properties of GNNs.This survey is organized as follows. In the rest of this chapter, we introduce notations and review the standard GNN models briefl y. In section 2, we see that GNNs cannot distinguish some graphs using elementary arguments and concrete examples. In section 3, we introduce the connection between GNNs and the WL algorithm.In section 4, we introduce the combinatorial problems that GNNs can/cannot solve in the light of the connection with distributed local algorithms. In section 5, we summarize the relationships among GNNs, the WL algorithm, and distributed local algorithms as the XS correspondence. 1.1Notations In this section, we introduce the notations we use in this survey. . denotes a set, and . denotes a multiset. A multiset is a set with possibly repeating el- ements. For example, 3,3,4 = 3,4, but 3,3,4 6= 3,4 . We sometimes regard a set as a multiset and vise versa. For every positive integer n Z+, n denotes the set 1,2,.n. A small letter, such as a, b, and c, denotes a scalar, a bold lower letter, such as a, b, and c, denotes a vector, and a bold 2 upper letter, such as A, B, and C, denotes a matrix or a tensor. Adenotes the transpose of A. For vectors a Raand b Rb, a,b Ra+bdenotes the concatenation of a and b. A graph is represented as a pair of the set V of nodes and the set E of edges. n denotes the number of the nodes and m denotes that number of the edges when the graph is clear from the context. For each node v, N(v) denotes the set of the neighboring nodes of node v, and deg(v) denotes the number of neighboring nodes of v. If a graph involves node feature vectors X = x1,.,xn Rnd, the graph is represented as a tuple G = (V,E,X). Table 1 summarizes notations. 1.2Problem Setting This survey focuses on the following node classifi cation problem and graph clas- sifi cation problem. Node Classifi cation Problem Input: A Graph G = (V,E,X) and a node v V . Output: The label yvof v. Graph Classifi cation Problem Input: A Graph G = (V,E,X). Output: The label yGof G. In particular, we consider the class of functions f : (G,v) 7 yvand f : G 7 yG that GNNs can compute because GNNs cannot model all functions on graphs, as we will see later. 1.3Graph Neural Networks In this section, we introduce standard GNN models briefl y. History of GNNs. Sperduti and Starita (1997) and Baskin et al. (1997) fi rst proposed GNN-like models. They extracted features from graph data using neu- ral networks instead of using hand-engineered graph fi ngerprints. Sperduti and Starita (1997) recursively applied a linear aggregation operation and non-linear activa- tion function, and Baskin et al. (1997) used parameter sharing to model the invariant transformations on the node and edge features. These characteris- tics are common with modern GNNs.Gori et al. (2005) and Scarselli et al. (2009) proposed novel graph learning models that used recursive aggregation and called these models graph neural networks. It should be noted that in this survey, GNNs do not stand only for their models, but GNNs is the general term for the following variants of their models. Li et al. (2016) extended the idea of Gori et al. (2005) and Scarselli et al. (2009) to Gated Graph Neural Net- works. Molecular Graph Network (Merkwirth and Lengauer, 2005) is a concur- rent model of the graph neural networks with similar architecture, which uses a 3 2 1 1 11 1 1 1 3 1 4 22 2 2 Input GraphReceived Messages , 1 1 , , , , 1 1 1 11 2 1 aggregation update update classification 1 4 5 ( , ) 1 2 2 3 3 3 3 3 ( , ) 1 1 ( , ) 2 3 ( , ) 1 4 ( , ) 1 3 , 1 1 , 1 1 , 1 1 1 ( , ) 1 2 ( , ) 1 2 ( , ) 1 2 p n p nn n n 2 p 1 3 3 3 3 4 5 Embeddings Binary Classification Figure 1: One-layered message passing graph neural networks. constant number of layers. Duvenaud et al. (2015) constructed a GNN model in- spired by circular fi ngerprints. Dai et al. (2016) proposed a GNN model inspired by the kernel message passing algorithm (Song et al., 2010, 2011). Gilmer et al. (2017) characterized GNNs using the message passing mechanism to provide a unifi ed view of GNNs. In this survey, we do not consider spectral variants of GNN models, such as those by Bruna et al. (2014) and Deff errard et al. (2016), but spatial methods based on the message passing mechanism. Message Passing Mechanism. In the light of the message passing mecha- nism, L-layered GNNs can be formulated as follows. h(0) v = xv(v V ), a(k) v = f(k) aggregate( h (k1) u | u N(v) )(k L,v V ),(1) h(k) v = f(k) update(h (k1) v ,a(k) v )(k L,v V ),(2) where f(k) aggregateand f (k) update are (parameterized) functions. Here, h(k1) u can be 4 2 1 1 11 1 1 1 3 1 4 22 2 2 Input Graph Received Messages , 1 1 , , , , 1 1 1 11 2 1 aggregation update update aggregation 1 4 5 ( , ) 1 2 2 3 3 3 3 3 ( , ) 1 1 ( , ) 2 3 ( , ) 1 4 ( , ) 1 3 update , 1 1 , 1 1 , 1 1 1 ( , ) 1 2 ( , ) 1 2 ( , ) 1 2 1 4 6 72 3 8 , 1 3 , , , , 2 3 3 35 3 1 5 , 3 3 , 3 4 , 1 4 4 5 8 6 ( , ) 3 7 4 1 7 3 2 ( , ) 1 4 ( , ) 2 1 ( , ) 4 6 ( , ) 5 5 ( , ) 3 3 ( , ) 3 8 ( , ) 3 2 classification p n p nn p n 4 p 5 3 2 7 1 8 6 Binary Classification Received Messages Embeddings Embeddings update Figure 2: Two-layered message passing graph neural networks. 5 seen as a “message” of node u in the k-th message-passing phase. Each node aggregates messages from their neighboring nodes to compute the next message or embedding. GNNs classify node v based on the fi nal embedding h(L) v . When no node features xvare available, we use the one-hot degree vector as the initial embedding, following Xu et al. (2019) and Knyazev et al. (2019). This scheme is illustrated in Figure 1 and 2, where colors stand for features and embeddings. The same color indicates the same vector. In this example, one-layered GNNs cannot distinguish nodes with embedding 3 in the lower-left graph in Figure 1. This indicates that if these nodes have diff erent class labels, one-layered GNNs always fail to classify these nodes correctly because GNNs classify a node based only on the fi nal embedding.In contrast, two-layered GNNs distinguish all nodes, as Figure 2 shows. In addition to the structural limitations, f(k) aggregateand f(k) update are not necessarily injective in general. For example, it is possible that f(k) aggregate( 1,1,2 ) = f (k) aggregate( 1,1 ) holds. This imposes more limitations on GNNs. This survey aims to determine the properties of graphs that GNNs can/cannot recognize. In the graph classifi cation problem, GNNs compute the graph embedding hGusing the readout function. hG= freadout( h(L) v | v V ),(3) where freadoutis a (parameterized) function.GNNs classify graph G based on the graph embedding hG. Typical GNN models can be formulated in the message passing framework as follows. GraphSAGE-mean (Hamilton et al., 2017b). f(k) aggregate( h (k1) u | u N(v) ) = 1 deg(v) X uN(v) h(k1) u , f(k) update(h (k1) v ,a(k) v ) = (W (l)h(k1) v ,a(k) v ). where W (l) is a parameter matrix and is an activation function such as sigmoid and ReLU. Graph Convolutional Networks (GCNs) (Kipf and Welling, 2017). f(k) aggregate( h (k1) u | u N(v) ) = X uN(v) h(k1) u pdeg(v)deg(u), f(k) update(h (k1) v ,a(k) v ) = (W (l)a(k) v ). Graph Attention Networks (GATs) (Veli ckovi c et al., 2018). 6 C C C C C C C C C C C C C C C C C C C C (a) Decaprismane. C C C C C C C C C C C C C C C C C C C C (b) Dodecahedrane. Figure 3: GNNs cannot distinguish these two molecules because both are 3- regular graphs with 20 nodes. (l) vu= exp(LeakyReLU(a(l)W (l)h(l1) v ,W (l)h(l1) u ) P uN(v)exp(LeakyReLU(a(l)W (l)h(l1) v ,W (l)h(l1) u ) , f(k) aggregate( h (k1) u | u N(v) ) = X uN(v) (l) vuh (k1) u , f(k) update(h (k1) v ,a(k) v ) = (W (l)a(k) v ). Technically, in these models, f(k) aggregate are not functions of h(k1) u but use side information such as the degrees of the neighboring nodes and attention weights. However, such information can be considered to be included in the message h(k1) u . Thus this abuse of notation does not aff ect the class of functions that these models can compute. Many other examples of message passing GNNs are provided by Gilmer et al. (2017). 2Graphs That GNNs Cannot Distinguish In this section, we discuss graphs that vanilla GNNs cannot distinguish via ele- mentary arguments and concrete examples. A k-regular graph is a graph where each node has exactly k neighboring nodes. Decaprismane C20H20(Schultz, 1965) and dodecahedrane C20H20(Paquette et al., 1983) are examples of 3- regular graphs, as illustrated in Figure 3. It is easy to see that message passing GNNs cannot distinguish k-regular graphs with the same size and identical node features. This phenomenon is illustrated in Figure 4. These two graphs are not isomorphic because the right graph contains triangles, but the left graph does 7 3 3 3 3 3 3 update 3 3 33 3 3 update 5 5 5 5 5 5 5 5 55 5 5 ( , , , ) 3 333 A ? ? ? , , 333 update update 2 2 2 2 2 2 2 2 22 2 2 ( , , , ) 5 555 , , 555 Embeddings Embeddings Embeddings Embeddings I !# $ % Grohe, 2017). The k-dimensional WL algorithm is a generalization of the 1-dimensional WL algorithm. This algorithm assigns a color to each k-tuple of nodes. k-dimensional WL (k-WL) algorithm Input: A pair of graphs G = (V,E,X) and H = (U,F,Y ). 1. c(0) v Hash(Gv) (v V k) 2. d(0) u Hash(Hu) (u Uk) 3. for l = 1,2,. (until convergence) (a) if c(l1) v | v V k 6= d(l1) u | u Uk return “non-isomorphic” (b) c(l) v,i c (l1) w | w Nk-WL G,i (v) (v V k,i k) (c) c(l) v Hash(c (l1) v ,c(l) v,1,c (l) v,2,.,c (l) v,k) (v V ) (d) d(l) u,i d (l1) w | w Nk-WL H,i (u) (u Uk,i k) (e) d(l) u Hash(d (l1) u ,d(l) u,1,d (l) u,2,.,d (l) u,k) (u U) 11 4. return “possibly isomorphic” where Nk-WL G,i (v1,v2,.,vk) = (v1,.,vi1,w,vi+1,.,vk) | w V is the i-th neighborhood, which replaces the i-th element of a k-tuple with every node of G. Hash is an injective hash function, which assigns the same color to the same isomorphic type.In other words, Hash(Gv1) = Hash(Gv2) if and only if (1) Xv1 i = Xv2 i i k and (2) v1 i,v1j E if and only if v2i,v2j E i,j k. The same thing holds for Hash(Hu1) and Hash(Hu2), and for Hash(Gv) and Hash(Hu). The k-dimensional folklore WL algorithm is another generalization of the 1-dimensional WL algorithm. k-dimensional folklore WL (k-FWL) algorithm 1. c(0) v Hash(Gv) (v V k) 2. d(0) u Hash(Hu) (u Uk) 3. for l = 1,2,. (until convergence) (a) if c(l1) v | v V k 6= d(l1) u | u Uk return “non-isomorphic” (b) c(l) v,w (c (l1) v0w,c (l1) v1w,.,c (l1) vkw) (v V k,w V ) (c) c(l) v Hash(c(l1) v , c(l) v,w| w V ) (v V k) (d) d(l) u,w (d (l1) u0w,d (l1) u1w,.,d (l1) ukw) (u U k,w U) (e) d(l) u Hash(d (l1) u , d(l) u,w| w U ) (u Uk) 4. return “possibly isomorphic” where c(v1,v2,.,vk)iw= c(v1,.,vi1,w,vi+1,.,vk).k-WL and k-FWL are also sound but not complete.In other words, if k-WL or k-FWL output “non- isomorphic”, then G and H are not isomorphic, but even if k-WL or k-FWL output “possibly isomorphic”, it is possible that G and H are not isomorphic. It should be noted that the folklore WL algorithm is sometimes refereed to as the WL algorithm in the theoretical computer science literature. Several relations are known about the capability of the variants of the WL algorithm. 1-WL is as powerful as 2-WL. In other words, for any pair of graphs, the outputs of both algorithms are the same. (see, e.g., (Cai et al., 1992; Grohe and Otto, 2015b; Grohe, 2017).) For all k 2, k-FWL is as powerful as (k+1)-WL. (see, e.g., (Grohe and Otto, 2015b; Grohe, 2017).) For all k 2, (k + 1)-WL is strictly more powerful than k-WL. In other words, there exists a pair of non-isomorphic graph (G,H) such that k-WL outputs “possibly isomorphic” but (k+1)-WL outputs “non-isomorphic”. (see, e.g., (Grohe and Otto, 2015b, Observation 5.13 and Theorem 5.17).) 12 m2 3 45 6 7 89 : ; ? B C D EF GH J K LM OP Q R ST U V W X YZ Hub Hub Hub (b) Each node answers after a constant number of communications. Figure 10: Illustration of distributed local algorithms. neighboring computers. For example, suppose mobile devices construct a com- munication network with near devices. The communication network must con- tain hub nodes to control communications, and hubs must form a vertex cover of the network (i.e., each edge is covered by at least hub nodes). The number of hub nodes should be as small as possible, but each mobile devices have to declare to be a hub within fi ve communication rounds. How can we design the algorithm for these devices? Figure 10 illustrates this problem. Distributed local algorithms were fi rst studied by Angluin (1980), Linial (1992), and Naor and Stockmeyer (1995).Angluin (1980) introduced a port numbering model and showed that deterministic distributed algorithms cannot fi nd a center of a graph without unique node identifi ers. Linial (1992) showed that no distributed local algorithms can solve 3-coloring of cycles, and they require (logn) communication rounds for distributed algorithms to solve the problem. Naor and Stockmeyer (1995) showed positive results for distributed local algorithms for the fi rst time. For example, distributed local algorithms can fi nd weak 2-coloring and solve a variant of the dining philosophers problem. Later, several non-trivial distributed local algorithms were found, including a 2- approximation algorithm for the minimum vertex cover problem (Astrand et al., 2009). It should be noted that although classical 2-approximation algorithm of the minimum vertex cover problem is simple, it is not trivial how to compute a 2-approximation solution to the minimum vertex cover problem in a distributed way. An extensive survey on distributed local algorithms is provided by Suomela (2013). There are many computational models of distributed algorithms. Among other computational models of distributed local algorithms, the standard com- 21 1 2 3 4 1 2 1 2 3 4 1 2 1 2 1 2 3 1 2 3 1 2 (a) A graph with a port numbering. (b) MBGNNs. 1 2 3 (c) VVC-GNNs. Figure 11: (a) An illustration of a port numbering. (b) MBGNNs send the same message to all the neighboring nodes. (c)VVC -GNNs send diff erent messages to the neighboring nodes putational model uses a port numbering. In this model, each node v has deg(v) ports, and each edge incident to v

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论