How can we enumerate triangles from an enormous graph with billions of vertices and edges? Triangle enumeration is an important task for graph data analysis with many applications including identifying suspicious users in social networks, detecting web spams, finding communities, etc. However, recent networks are so large that most of the previous algorithms fail to process them. Recently, several MapReduce algorithms have been proposed to address such large networks; however, they suffer from the massive shuffled data resulting in a very long processing time.

In this paper, we propose PTE (*Pre-partitioned Triangle Enumeration*), a new distributed algorithm for enumerating triangles in enormous graphs by resolving the structural inefficiency of the previous MapReduce algorithms. PTE enumerates trillions of triangles in a billion scale graph by decreasing three factors: the amount of shuffled data, total work, and network read.
Experimental results show that PTE provides up to 47 times faster performance than recent distributed algorithms on real world graphs, and succeeds in enumerating more than 3 *trillion* triangles on the ClueWeb12 graph with 6.3 billion vertices and 72 billion edges, which any previous triangle computation algorithm fail to process.

The running time of proposed methods (PTE_{SC}, PTE_{CD}, PTE_{BASE}) and competitors (CTTP, MGT, GraphLab) on real world datasets (log scale).
GraphX is not shown since it failed to process any of the datasets.
Missing methods for some datasets mean they failed to run on the datasets.
PTE_{SC} shows the best performances outperforming CTTP and MGT by up to 47 times and 17 times, respectively. Only the proposed algorithms succeed in processing the ClueWeb12 graph containing 6.3 billion vertices and 72 billion edges.

@inproceedings{conf/kdd/ParkMK16,
author = {Ha{-}Myung Park and
Sung{-}Hyon Myaeng and
U. Kang},
title = {{PTE:} Enumerating Trillion Triangles On Distributed Systems},
booktitle = {KDD},
pages = {1115--1124},
year = {2016}
}

Name | #Nodes | #Edges | #Triangles | Description | Source |
---|---|---|---|---|---|

42M | 1.2B | 34824916864 | Who-follows-whom in Twitter | Kwak10www - Twitter | |

SubDomain | 0.1B | 1.9B | 417761664336 | Links among subdomains on the Web | WDC - Subdomain/Host Graph |

YahooWeb | 1.4B | 6.6B | 85782928684 | Page level hyperlink network on the Web | Yahoo-webscope |

ClueWeb09 | 4.8B | 7.9B | 31013037486 | Page level hyperlink network on the Web | ClueWeb09 Wiki |

ClueWeb12 | 6.3B | 72B | 3058034046618 | Page level hyperlink network on the Web | ClueWeb12 Web Graph |

- Ha-Myung Park (Korea Advanced Institute of Science and Technology, Korea)
- Sung-Hyon Myaeng (Korea Advanced Institute of Science and Technology, Korea)
- U Kang (Seoul National University, Korea)