MapReduce Triangle Enumeration
with Guarantees

We describe an optimal randomized MapReduce algorithm for the problem of triangle enumeration. This generalizes the well-known vertex partitioning approach proposed in (Suri and Vassilvitskii, 2011) to multiple rounds, significantly increasing the size of the graphs that can be handled on a given system. We also give new theoretical (high probability) bounds on the work needed in each reducer, addressing the "curse of the last reducer". Indeed, our work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph.

Overview

Our experimental evaluation shows the scalability of our approach, that it is competitive with existing methods improving the performance by a factor up to 2 times, and that it can significantly increase the size of datasets that can be processed.
Fast
Scalable
Configurable


Paper

CTTP is described in the following paper.

Code

The binary codes used in this paper are available. [Download]

Datasets

Name#Nodes#Edges#TrianglesDescriptionSource
LiveJournal4.8M69M285730264 LiveJournal online social network SNAP - LiveJornal1
PhoneCall30M23M149045937 Phone call records on Dec.2007 and Jan.2008 Private data
Twitter42M1.2B34824916864 Following network of Twitter Kwak10www - Twitter
SubDomain0.1B1.9B417761664336 Links among subdomains on the Web WDC - Subdomain/Host Graph
YahooWeb1.4B6.6B85782928684 Page level hyperlink network on the Web Yahoo-webscope
ClueWeb094.8B7.9B31013037486 Page level hyperlink network on the Web ClueWeb09 Wiki


People