Revisiting Link Prediction: A Data Perspective

AI-generated keywords: Link Prediction Local Structural Proximity Global Structural Proximity Feature Proximity Graph Neural Networks

AI-generated Key Points

  • Link prediction is a fundamental task in various applications such as friend recommendation, protein analysis, and drug interaction prediction.
  • Datasets in these domains can have distinct underlying mechanisms of link formation, making it challenging to find a universally best algorithm suitable for all datasets.
  • Three critical factors for link prediction are local structural proximity (LSP), global structural proximity (GSP), and feature proximity (FP).
  • GSP is more effective when LSP is deficient, indicating the importance of global structural information when there are limited local connections between nodes.
  • There is an incompatibility between FP and LSP; when feature proximity dominates graph neural networks (GNNs) for link prediction consistently underperform.
  • Practical instructions for designing GNN4LP models and guidelines for selecting appropriate benchmark datasets are provided based on these insights.
  • The paper discusses limitations of the study and potential broader impacts.
  • The findings contribute to advancing our understanding of link formation mechanisms across diverse domains.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Haitao Mao, Juanhui Li, Harry Shomer, Bingheng Li, Wenqi Fan, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang

36 pages, 12 figures
License: CC BY 4.0

Abstract: Link prediction, a fundamental task on graphs, has proven indispensable in various applications, e.g., friend recommendation, protein analysis, and drug interaction prediction. However, since datasets span a multitude of domains, they could have distinct underlying mechanisms of link formation. Evidence in existing literature underscores the absence of a universally best algorithm suitable for all datasets. In this paper, we endeavor to explore principles of link prediction across diverse datasets from a data-centric perspective. We recognize three fundamental factors critical to link prediction: local structural proximity, global structural proximity, and feature proximity. We then unearth relationships among those factors where (i) global structural proximity only shows effectiveness when local structural proximity is deficient. (ii) The incompatibility can be found between feature and structural proximity. Such incompatibility leads to GNNs for Link Prediction (GNN4LP) consistently underperforming on edges where the feature proximity factor dominates. Inspired by these new insights from a data perspective, we offer practical instruction for GNN4LP model design and guidelines for selecting appropriate benchmark datasets for more comprehensive evaluations.

Submitted to arXiv on 01 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.00793v1

The paper titled "Revisiting Link Prediction: A Data Perspective" explores the principles of link prediction on graphs from a data-centric perspective. Link prediction is a fundamental task in various applications such as friend recommendation, protein analysis, and drug interaction prediction. However, datasets in these domains can have distinct underlying mechanisms of link formation, making it challenging to find a universally best algorithm suitable for all datasets. In this study, the authors recognize three critical factors for link prediction: local structural proximity (LSP), global structural proximity (GSP), and feature proximity (FP). They aim to understand the relationships among these factors and their impact on link prediction performance. Through empirical and theoretical analysis, the authors make several key findings. Firstly, they observe that GSP is more effective when LSP is deficient. In other words, global structural information becomes increasingly important when there are limited local connections between nodes. Secondly, they identify an incompatibility between FP and LSP; when feature proximity dominates graph neural networks (GNNs) for link prediction consistently underperform. Based on these insights, the authors provide practical instructions for designing GNN4LP models and guidelines for selecting appropriate benchmark datasets to ensure more comprehensive evaluations. They also discuss limitations of their study and potential broader impacts. Overall, this paper offers valuable insights into link prediction from a data perspective and provides guidance for improving model design and dataset selection in this field. The findings contribute to advancing our understanding of link formation mechanisms across diverse domains.
Created on 14 Nov. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.