PegasusN provides large scale algorithms for important graph mining tasks:
- Graph Structure Analyses
- Random Walk with Restart (RWR)
- Single Source Shortest Path (SSSP)
- Label Propagation
- Connected Component
- Subgraph Enumeration
- Other pattern graphs
- Graph Generation
- Graph Visualization
The details of PegasusN can be found in the following paper:
Ha-Myung Park, Chiwan Park, U Kang.
PegasusN: A Scalable and Versatile Graph Mining System.
AAAI 2018, New Orleans, Lousiana, USA. (Demo Paper)
The sources and documentation are distributed under the 3-clause BSD license.
Graph Mining with PegasusN
Graph Mining is an area of data mining to find patterns, rules, and anomalies of graphs.
Why Should We Care?Graphs or networks are everywhere, ranging from the Internet Web graph, social networks(FaceBook, Twitter), biological networks, and many more. Finding patterns, rules, and anomalies have numerous applications including, but not limited to, the followings:
- Ranking web pages by search engine
- 'viral' or 'word-of-mouth' marketing
- Patterns of disease with potential impact for drug discovery
- Computer network security: email/IP traffic and anomaly detection
Why PegasusN?Existing works on graph mining has limited scalability: usually, the maximum graph size is order of millions. PegasusN breaks the limit by scaling up the algorithms to billion-scale graphs. The breakthrough was possible by the careful algorithm design and implementation for Hadoop, a massive cloud computing platform. To summarize, PegasusN has three major advantages.
- Large Graph Mining Package
Graphs with billions of nodes and edges
- Parallel Algorithms on Hadoop and Spark
Massive cloud computing platform
- Open Source