## Overview

**PegasusN** is a **Peta-scale graph mining system**, fully written in Java.
It runs in parallel, distributed manner on top of Hadoop and Spark.

PegasusN provides large scale algorithms for important graph mining tasks:

- Graph Structure Analyses
- PageRank
- Random Walk with Restart (RWR)
- Radius/Diameter
- Degree
- Single Source Shortest Path (SSSP)
- Label Propagation
- Connected Component
- Subgraph Enumeration
- Triangle
- Cliques
- Other pattern graphs
- Graph Generation
- R-MAT
- Kronecker
- Random

- Graph Visualization

The details of PegasusN can be found in the following paper:

Ha-Myung Park, Chiwan Park, U Kang.

PegasusN: A Scalable and Versatile Graph Mining System.

AAAI 2018, New Orleans, Lousiana, USA.
(Demo Paper)

## Graph Mining with PegasusN

**Graph Mining** is an area of data mining to find patterns, rules, and anomalies of graphs.

### Why Should We Care?

Graphs or networks are everywhere, ranging from the Internet Web graph, social networks(FaceBook, Twitter), biological networks, and many more. Finding patterns, rules, and anomalies have numerous applications including, but not limited to, the followings:- Ranking web pages by search engine
- 'viral' or 'word-of-mouth' marketing
- Patterns of disease with potential impact for drug discovery
- Computer network security: email/IP traffic and anomaly detection

### Why PegasusN?

Existing works on graph mining has limited scalability: usually, the maximum graph size is order of millions. PegasusN breaks the limit by scaling up the algorithms to billion-scale graphs. The breakthrough was possible by the careful algorithm design and implementation for Hadoop, a massive cloud computing platform. To summarize, PegasusN has three major advantages.**Large Graph Mining Package**

Graphs with billions of nodes and edges**Parallel Algorithms on Hadoop and Spark**

Massive cloud computing platform**Open Source**