How can we effectively compress big graphs composed of billions of edges? By concentrating non-zeros in the adjacency matrix through vertex rearrangement, we can compress big graphs more efficiently. Also, we can boost the performance of several graph mining algorithms such as PageRank and Random Walk with Restart. SlashBurn is a state-of-the-art vertex rearrangement method. It processes real-world graphs effectively by utilizing the power-law characteristic of the real-world networks. However, the original SlashBurn algorithm displays a noticeable slowdown for large-scale graphs, and cannot be used at all when graphs are too large to fit in a single machine since it is designed to run on a single machine. In this paper, we propose a distributed SlashBurn algorithm to overcome these limitations. Distributed SlashBurn processes big graphs much faster than the original SlashBurn algorithm does. In addition, it scales up well by performing the large-scale vertex rearrangement process in a distributed fashion. In our experiments using real-world big graphs, the proposed distributed SlashBurn algorithm was found to run more than 45 times faster than the single machine counterpart, and process graphs that are 16 times bigger compared to the original method.
The binary code of DSlashBurn will soon be available for download.
Name | # Vertices | # Edges | Description | Source | Download |
---|---|---|---|---|---|
Amazon | 334,863 | 925,872 | Amazon product co-purchasing network | SNAP | Link |
Web-BS | 685,230 | 7,600,595 | Berkeley-Stanford web graph | SNAP | Link |
LiveJournal-Links | 5,204,176 | 49,174,620 | LiveJournal friendship network | KONECT | Link |
LiveJournal-Membership | 10,690,276 | 112,307,385 | LiveJournal membership network | KONECT | Link |
Orkut | 11,514,053 | 327,037,487 | Orkut membership network | KONECT | Link |
Friendster | 65,608,366 | 1,806,067,135 | Friendster friendship network | SNAP | Link |
R-MAT | 65,536 ~ 33,554,432 |
4,194,304 ~ 2,147,483,648 |
Synthetic graphs |