Given a multigraph stream (e.g., Facebook messages or network traffic) where duplicate edges arrive continuously, how can we accurately and memory-efficiently estimate local triangles for all nodes? Local triangle counting in a graph stream is one of the fundamental tasks in graph mining with important applications including anomaly detection, social role identification, community detection, etc. Many recent graph streams include duplicate edges, hence form multigraph streams: e.g., many network packets might have a same (source, destination) pair. Although there have been several local triangle counting methods for multigraph streams, they have problems in terms of accuracy and memory efficiency; furthermore, most methods support either binary or weighted counting, and thus cannot find anomalies whose detection requires both types of counting. In this paper, we propose FURL, a memory-efficient and accurate local triangle counting method for multigraph streams. FURL has two main advantages. First, FURL improves accuracy by 1) reducing the variance of its estimation via a regularization strategy, and 2) sampling more triangles than the state-of-the-art methods do, by using memory space efficiently. Second, FURL finds anomalies which state-of-the-art methods cannot discover. Experimental results show that FURL outperforms state-of-the-art methods in terms of accuracy and memory efficiency. Thanks to FURL, we discover interesting anomalies from a Bitcoin network.
Name | # Nodes | # Edges | Description | Name | # Nodes | # Edges | Description |
---|---|---|---|---|---|---|---|
45,813 | 855,542 | Wall posts on other user's wall on Facebook | DBLP | 1,314,050 | 18,986,618 | Co-author network in DBLP | |
Enron | 86,978 | 1,134,990 | Enron email network | Bitcoin | 6,297,539 | 28,143,065 | Bitcoin transaction network |
WikiGER | 506,174 | 4,555,759 | Communication network of the German Wikipedia | Livejournal | 4,847,571 | 68,475,391 | Social network of LiveJournal |