How can we measure similarity between nodes quickly and accurately on large graphs? Random walk with restart (RWR) provides a good measure, and has been used in various data mining applications including ranking, recommendation, link prediction and community detection. However, existing methods for computing RWR do not scale to large graphs containing billions of edges; iterative methods are slow in query time, and preprocessing methods require too much memory.
In this paper, we propose BePI, a scalable, accurate, and fast method for computing RWR on billion-scale graphs. BePI exploits the best properties from both preprocessing methods and iterative methods. BePI uses a block elimination approach, which is a preprocessing method, to enable fast query time. Also, BePI uses a preconditioned iterative method to decrease memory requirement. The performance of BePI is further improved by decreasing non-zeros of the matrix for the iterative method. Through extensive experiments, we show that BePI processes 100× larger graphs, and requires up to 130× less memory space than other preprocessing methods. In the query phase, BePI computes RWR scores up to 9× faster than existing methods.
BePI is described in the following paper:
The code of the methods used in the paper is available here.
Name | #Nodes | #Edges | Description | Source | Download |
---|---|---|---|---|---|
Slashdot | 79K | 516K | Social network of users in the technology news site Slashdot | DAI-Labor | Link |
Wikipedia | 100K | 1.6M | Hyperlink network between articles of the English Wikipedia | KONECT (The Koblenz Network Collection) | Link |
Baidu | 416K | 3.3M | Hyperlink network between articles of the Chinese online encyclopedia Baidu | Zhishi.me | Link |
Flickr | 2.3M | 33.1M | Flickr friendship network | Max Planck Institute for Software Systems | Link |
LiveJournal | 4.8M | 68.5M | LiveJournal friendship network | Stanford Large Network Dataset Collection | Link |
WikiLink | 11.2M | 340.2M | Wiki-links of the English Wikipedia | Wikimedia | Link |
41.7M | 1.5B | Twitter follower network | Advanced Networking Lab at Kaist | Link | |
Friendster | 68.3M | 2.6B | Friendster friendship network | Internet Archive | Link |