Deep Graph Representation Learning: Scalability and Applications

Zhou, Kaixiong

Deep Graph Representation Learning: Scalability and Applications

Files

ZHOU-DOCUMENT-2023.pdf (3.92 MB)

Date

2023-08-10

Authors

Zhou, Kaixiong

Abstract

The ubiquity of graphs in science and industry has motivated the developments of graph representation learning algorithms, where graph neural networks (GNNs) have emerged as one of the predominant computational tools. In general, GNNs apply a recursive message passing mechanism to learn the representation of each node by incorporating representations of itself and its neighbors. Despite the promising results of GNNs achieved in many fields, their scalability and application are still too limited to learn the complex and large-scale graph data. The scalability of GNNs is defined from two perspectives: model depth scalability and data processing scalability. First, the model depth of GNNs is often less than three layers, which prevents one from effectively modeling the high-order neighborhood dependencies. Second, GNNs are notorious to suffer from bottlenecks of memory space and computation time on the large graphs, which are characterized by the large amount of nodes and edges. Third, although many GNN prototypes have been proposed in the benchmark datasets, it is not straightforward to apply GNNs to a new application on hand with the specific domain knowledge.

To address the above challenges, I have devoted to exploring a series of work to advances the optimization of deep GNNs, the efficient training on large graphs, and their well-performing applications. Part I aims at scaling up the model depth at graph neural architecture to learn the complex neighborhood structure. At the fundamental theory level, we analyze the over-smoothing issue within deep model, where the node representation vectors over the graph converge to similar embeddings. At the algorithm level, we develop a set of novel tricks including normalization, skip connection, and weight regularization to tackle the over-smoothing. At the benchmark level, we develop the first platform to comprehensively incorporate the existing tricks, fairly evaluate them, and propose a new model of deep GNNs with superior generalization performance across tens of benchmark datasets.

At Part II, we present algorithms to enhance GNNs’ scalability in learning the large-scale graph datasets. A novel training paradigm of graph isolated training is proposed to decouple the large graph into many small clusters and train expert GNNs for each of them. By cutting down the inter-cluster communication between clusters, our solution significantly accelerates the training process while maintaining the node classification accuracy. We also analyze label bias issue at the small batch, which might lead to overfitting of GNNs. An adaptive label smoothing is then designed to address the label bias and improve the model’s generalization performance.

At Part III, we further explore the wide applications of GNNs. Based on the transfer learning paradigm of “pre-train, prompt, fine-tune”, we design the first graph prompting function. The graph prompt reformulates the downstream task looking the same as the pretext one and transfers the pre-trained model easily to the downstream problem. At the area of bioinformatics, we extend GNNs to hierarchically learn the different abstract graph structures of graph molecules. At the area of tabular data mining, we use GNNs to explicitly learn the feature interactions between columns and make recommendation for each sample. Finally, I discuss the future work of graph machine learning.

Advisor

Hu, Xia (Ben)

Degree

Doctor of Philosophy

Type

Thesis

Keywords

Deep graph neural networks, large-scale graph machine learning, graph batch bias, graph prompt, molecular graph representation learning.

Citation

Zhou, Kaixiong. "Deep Graph Representation Learning: Scalability and Applications." (2023) Diss., Rice University. https://hdl.handle.net/1911/115249.

Rights

Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.

Citable link to this page

https://hdl.handle.net/1911/115249

Collections

Rice University Theses and Dissertations

Full item page