Projects

Big Graph Mining

How can we find patterns and anomalies in large graphs that do not fit in the memory or disks of a single machine? Graphs are everywhere in our lives: social networks, the World Wide Web, biological networks, and many more. These graphs are growing at unprecedented rate, now exceeding billions of nodes and edges. What are the patterns and anomalies in such big graphs? How to design scalable algorithms to discover them? In this project, we work on algorithms, systems, and applications for analyzing large graphs, like social networks or the Web.

Algorithms and Systems:

We develop algorithms and systems for the following areas.

  • Graph processing engine: we develop large graph processing engines and related algorithms to handle very large graphs whose sizes are more than hundreds of billions of edges.
  • Triangle and subgraph mining: we develop distributed algorithms for triangle and subgraph matching and mining.
  • Ranking in graphs (random walk with restart): we develop algorithms for fast random walk with restart exploiting the characteristics of real world graphs.

Applications:

We develop applications in the following areas.

  • Graph over time: temporal trend analysis of real world graphs.
  • Link prediction: predict links that are likely to be made in the future.
  • Community detection: find communities that are groups tightly connected within each other, while loosely connected to other groups.
  • Anomaly detection: find anomalous users and activities in graphs.
  • Visualization and sensemaking: summarize, understand, and sense-make graphs.

Software

  • Pegasus: Peta-Scale Graph Mining System

Publication

  • Ha-Myung Park, Francesco Silvestri, Rasmus Pagh, Chin-wan Chung, Sung-Hyon Myaeng, and U Kang, "Enumerating Trillion Subgraphs On Distributed Systems", ACM Transactions on Knowledge Discovery from Data (TKDD), 2018. (to appear)
  • Minji Yoon, Jinhong Jung, and U Kang, "TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs", 34th IEEE International Conference on Data Engineering (ICDE) 2018, Paris, France. [PDF]
  • Junghwan Kim, Haekyu Park, Ji-Eun Lee, and U Kang, "SIDE: Representation Learning in Signed Directed Networks", The Web Conference (WWW) 2018, Lyon, France. [PDF] [BIBTEX]
  • Minji Yoon, Woojeong Jin, and U Kang, "Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees", The Web Conference (WWW) 2018, Lyon, France. [PDF] [BIBTEX]
  • Ha-Myung Park, Chiwan Park, and U Kang, "PegasusN: A Scalable and Versatile Graph Mining System", Thirty-Second AAAI Conference on Artificial Intelligence (AAAI) 2018, New Orleans, Lousiana, USA. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Haekyu Park, Jinhong Jung, and U Kang, "A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems", IEEE International Conference on Big Data (BigData) 2017, Boston, MA, USA. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Yongsub Lim, Minsoo Jung, and U Kang, "Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs", ACM Transactions on Knowledge Discovery from Data (TKDD), vo. 12, issue 1, Feburuary 2018. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Jinhong Jung, Namyong Park, Lee Sael, and U Kang, "BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart", ACM International Conference on Management of Data (SIGMOD) 2017, Raleigh, North Carolina, USA. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Ha-Myung Park, Namyong Park, Sung-Hyon Myaeng, and U Kang, "Partition Aware Connected Component Computation in Distributed Systems", IEEE International Conference on Data Mining (ICDM) 2016, Barcelona, Spain. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Jinhong Jung, Woojeong Jin, Lee Sael, and U Kang, "Personalized Ranking in Signed Networks using Signed Random Walk with Restart", IEEE International Conference on Data Mining (ICDM) 2016, Barcelona, Spain. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Ha-Myung Park, Sung-Hyon Myaeng, and U Kang, "PTE: Enumerating Trillion Triangles On Distributed System", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2016, San Francisco, USA. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Minsoo Jung, Sunmin Lee, Yongsub Lim, U Kang, "FURL: Fixed-memory and Uncertainty Reducing Local Triangle Counting for Graph Streams", arXiv: 1611.06615 [cs.DS], 26 November 2016. [BIBTEX] [PDF]
  • Min-Hee Jang, Christos Faloutsos, Sang-Wook Kim, U Kang, and Jiwoon Ha, "PIN-TRUST: Fast Trust Propagation Exploiting Positive, Implicit, and Negative Information", ACM International Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, Indiana, USA. [BIBTEX] [PDF]
  • Hugo Gualdron, Robson Cordeiro, Jose Rodrigeus-Jr, Duen Horng (Polo) Chau, Minsuk Kahng, and U Kang., "M-Flash: Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model", European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML-PKDD) 2016, Riva Del Garda, Italy. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Yongsub Lim, Won-Jo Lee, Ho-Jin Choi, and U Kang, "MTP: Discovering High Quality Partitions in Real World Graphs", World Wide Web Journal [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Jinhong Jung, Kijung Shin, Lee Sael, and U Kang, "Random Walk with Restart on Large Graphs Using Block Elimination", ACM Transactions on Database Systems (TODS), vol. 41, issue 2, pp. 12:1-12:43, June 2016. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • ByungSoo Jeon, Inah Jeon, and U Kang, "TeGViz: Distributed Tera-Scale Graph Generation and Visualization", IEEE International Conference on Data Mining (ICDM) 2015, Atlantic City, USA. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Kijung Shin, Jinhong Jung, Lee Sael, and U Kang, "BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs", ACM International Conference on Management of Data (SIGMOD) 2015, Melbourne, Australia [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos, "Summarizing and understanding large graphs", Statistical Analysis and Data Mining, doi: 10.1002/sam.11267, 18 May 2015. [BIBTEX] [PDF]
  • Yongsub Lim , Won-Jo Lee , Ho-Jin Choi, and U Kang, "Discovering Large Subsets with High Quality Partitions in Real World Graphs", Second International Conference on Big Data and Smart Computing (BigComp) 2015, Jeju, korea. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Ha-Myung Park, Francesco Silvestri, U Kang, Rasmus Pagh, "MapReduce Triangle Enumeration With Guarantees", 23rd ACM International Conference on Information and Knowledge Management (CIKM) 2014, Shaghai, China [BIBTEX] [HOMEPAGE (CODE, DATASET)] [PDF]
  • Yongsub Lim, U Kang, and Christos Faloutsos, "SlashBurn: Graph Compression and Mining beyond Caveman Communities", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 26, no. 12, pp. 3077-3089, April 2014. [BIBTEX] [CODE] [PDF]
  • Jungeun Kim, Minsoo Choy, Daehoon Kim, and U Kang, "Link Prediction Based on Generalized Cluster Information", 23rd International World Wide Web Conference (WWW) 2014, Seoul, Korea. [BIBTEX] [PDF]
  • U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos, "Net-Ray: Visualizing and Mining Billion-Scale Graphs", Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2014, Tainan, Taiwan. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos, "VoG: Summarizing and Understanding Large Graphs", SIAM International Conference on Data Mining (SDM) 2014, Philadelphia, Pennsylvania, USA. [BIBTEX] [PDF]
  • Zhiyuan Lin, Nan Cao, Hanghang Tong, Fei Wang, U Kang, and Duen Horng Chau, "Interactive Multi-resolution Exploration of Million Node Graphs", IEEE VIS 2013, Atlanta, Georgia, USA. [BIBTEX] [PDF]
  • U Kang, Martial Hebert, and Soonyong Park, "Fast and Scalable Approximate Spectral Graph Matching for Correspondence Problems", Information Sciences, 2013. [BIBTEX] [PDF]
  • U Kang and Christos Faloutsos, "Big graph mining: algorithms and discoveries", ACM SIGKDD Explorations Newsletter Volume 14 Issue 2, December 2012. pp. 29-36. [BIBTEX] [PDF]
  • U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos, "GBASE: An Efficient Analysis Platform for Large Graphs", VLDB Journal, 2012. [BIBTEX] [PDF]
  • U Kang, Hanghang Tong, and Jimeng Sun, "Fast Random Walk Graph Kernel", SIAM International Conference on Data Mining (SDM) 2012, Anaheim, California, USA. (acceptance rate 27 %) [BIBTEX] [PDF]
  • U Kang and Christos Faloutsos, "Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining", IEEE International Conference on Data Mining (ICDM) 2011, Vancouver, Canada. (acceptance rate 12.2 %) [BIBTEX] [CODE] [PDF]
  • U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos, "GBASE: A Scalable and General Graph Management System", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2011, San Diego, CA, USA. (acceptance rate 17.5 %) [BIBTEX] [PDF]
  • U Kang, Spiros Papadimitriou, Jimeng Sun, and Hanghang Tong, "Centralities in Large Networks: Algorithms and Observations", SIAM International Conference on Data Mining (SDM) 2011, Mesa, Arizona, USA. (acceptance rate 25.1 %) [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec, "HADI: Mining Radii of Large Graphs", ACM Transactions on Knowledge Discovery from Data (TKDD), 2011. [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos, "PEGASUS: Mining Peta-Scale Graphs", Knowledge and Information Systems (KAIS), Springer, 2011. [BIBTEX] [PDF]
  • U Kang, Mary McGlohon, Leman Akoglu, and Christos Faloutsos, "Patterns on the Connected Components of Terabyte-Scale Graphs", IEEE International Conference on Data Mining (ICDM) 2010, Sydney, Australia. (acceptance rate 19.4 %) [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec, "Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations", SIAM International Conference on Data Mining (SDM) 2010, Columbus, Ohio, USA. (acceptance rate 23.4 %) [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos, "PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations", IEEE International Conference on Data Mining (ICDM) 2009, Miami, Florida, USA. (acceptance rate 8.9 %) [BIBTEX] [PDF] [PEGASUS HOMEPAGE]
  • Charalampos E. Tsourakakis, U Kang, Gary Miller, and Christos Faloutsos, "DOULION: Counting Triangles in Massive Graphs with a Coin", ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009, Paris, France. (acceptance rate 20 %) [BIBTEX] [PDF]