DSlashBurn: A Distributed Vertex Rearrangement Algorithm for Compressing and Mining Big Graphs

Overview

How can we effectively compress big graphs composed of billions of edges? By concentrating non-zeros in the adjacency matrix through vertex rearrangement, we can compress big graphs more efficiently. Also, we can boost the performance of several graph mining algorithms such as PageRank and Random Walk with Restart. SlashBurn is a state-of-the-art vertex rearrangement method. It processes real-world graphs effectively by utilizing the power-law characteristic of the real-world networks. However, the original SlashBurn algorithm displays a noticeable slowdown for large-scale graphs, and cannot be used at all when graphs are too large to fit in a single machine since it is designed to run on a single machine. In this paper, we propose a distributed SlashBurn algorithm to overcome these limitations. Distributed SlashBurn processes big graphs much faster than the original SlashBurn algorithm does. In addition, it scales up well by performing the large-scale vertex rearrangement process in a distributed fashion. In our experiments using real-world big graphs, the proposed distributed SlashBurn algorithm was found to run more than 45 times faster than the single machine counterpart, and process graphs that are 16 times bigger compared to the original method.

Paper

- A Distributed Vertex Rearrangement Algorithm for Compressing and Mining Big Graphs
  Namyong Park, Chiwan Park, and U Kang.
  Journal of KIISE, Vol. 43, No. 10, pp. 1131-1143, 2016.
  [PDF]

Code

The binary code of DSlashBurn will soon be available for download.

Datasets

Name # Vertices # Edges Description Source Download
Amazon 334,863 925,872 Amazon product co-purchasing network SNAP Link
Web-BS 685,230 7,600,595 Berkeley-Stanford web graph SNAP Link
LiveJournal-Links 5,204,176 49,174,620 LiveJournal friendship network KONECT Link
LiveJournal-Membership 10,690,276 112,307,385 LiveJournal membership network KONECT Link
Orkut 11,514,053 327,037,487 Orkut membership network KONECT Link
Friendster 65,608,366 1,806,067,135 Friendster friendship network SNAP Link
R-MAT 65,536 ~
33,554,432
4,194,304 ~
2,147,483,648
Synthetic graphs

People

Namyong Park
Department of Computer Science and Engineering
Seoul National University
Chiwan Park
Department of Computer Science and Engineering
Seoul National University
U Kang
Department of Computer Science and Engineering
Seoul National University
Copyright © 2016, By Data Mining Laboratory, Department of Computer Science and Engineering, Seoul National University, All Rights Reserved.