프로젝트

대용량 그래프 마이닝

단일 머신의 메모리나 디스크에서 다룰 수 없는 크기의 대용량 그래프에서 어떻게 패턴이나 이상 신호를 찾을 수 있을까? 그래프는 우리 생활 어디에서나 찾아볼 수 있으며 그 예로는 소셜 네트워크, 인터넷, 생물학 네트워크 등이 있다. 이런 그래프들의 크기는 전례 없는 비율로 증가하고 있으며 현재 그래프의 정점과 간선의 수는 수십억을 넘고 있다. 이러한 대용량 그래프에서의 패턴과 이상 신호는 무엇일까? 그것들을 찾기 위해 확장성 있는 알고리즘을 어떻게 만들어야 할까? 이 프로젝트에서는 소셜 네트워크와 웹과 같은 대용량 그래프들을 분석하기 위한 알고리즘과 시스템을 개발하고 응용 분야들을 연구한다.

알고리즘과 시스템:

데이터 마이닝 연구실에서는 다음과 같은 분야에 대한 알고리즘과 시스템을 개발한다.

  • 그래프 처리 엔진: 수천억의 이상의 간선을 가지는 큰 크기의 대용량 그래프를 다루기 위해 대용량 그래프 처리 엔진과 그와 연관된 알고리즘을 개발한다.
  • 삼각형과 부분 그래프 마이닝: 삼각형과 부분 그래프 매칭 및 마이닝을 위한 분산 알고리즘을 개발한다.
  • 그래프에서의 랭킹 (random walk with restart): 실세계 그래프의 특징들을 이용하는 빠른 Random walk with restart 알고리즘을 개발한다.

응용:

데이터 마이닝 연구실에서는 다음과 같은 응용 분야에 대하여 연구한다.

  • 시간 변화 그래프: 실세계 그래프의 시간 경향을 분석한다.
  • 링크 예측: 미래에 생성될 링크들을 예측한다.
  • 커뮤니티 탐색: 다른 그룹과는 느슨하게 연결되어 있지만 그룹 내에선 서로 긴밀하게 연결된 커뮤니티를 찾는다.
  • 이상 신호 탐색: 그래프에서 이상한 사용자와 행동을 찾는다.
  • 시각화 및 의미 찾기: 그래프를 요약, 이해하고 그래프에서의 의미를 찾는다.

소프트웨어

  • Pegasus: 페타(Peta) 규모의 그래프 마이닝 시스템

연구실적

  • Ha-Myung Park, Francesco Silvestri, Rasmus Pagh, Chin-wan Chung, Sung-Hyon Myaeng, and U Kang, "Enumerating Trillion Subgraphs On Distributed Systems", ACM Transactions on Knowledge Discovery from Data (TKDD), 2018. [PDF] [BIBTEX] [HOMEPAGE (CODE , DATA)]
  • Minji Yoon, Jinhong Jung, and U Kang, "TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs", 34th IEEE International Conference on Data Engineering (ICDE) 2018, Paris, France. [PDF]
  • Junghwan Kim, Haekyu Park, Ji-Eun Lee, and U Kang, "SIDE: Representation Learning in Signed Directed Networks", The Web Conference (WWW) 2018, Lyon, France. [PDF] [BIBTEX]
  • Minji Yoon, Woojeong Jin, and U Kang, "Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees", The Web Conference (WWW) 2018, Lyon, France. [PDF] [BIBTEX]
  • Ha-Myung Park, Chiwan Park, and U Kang, "PegasusN: A Scalable and Versatile Graph Mining System", Thirty-Second AAAI Conference on Artificial Intelligence (AAAI) 2018, New Orleans, Lousiana, USA. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Haekyu Park, Jinhong Jung, and U Kang, "A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems", IEEE International Conference on Big Data (BigData) 2017, Boston, MA, USA. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Yongsub Lim, Minsoo Jung, and U Kang, "Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs", ACM Transactions on Knowledge Discovery from Data (TKDD), vo. 12, issue 1, Feburuary 2018. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Jinhong Jung, Namyong Park, Lee Sael, and U Kang, "BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart", ACM International Conference on Management of Data (SIGMOD) 2017, Raleigh, North Carolina, USA. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Ha-Myung Park, Namyong Park, Sung-Hyon Myaeng, and U Kang, "Partition Aware Connected Component Computation in Distributed Systems", IEEE International Conference on Data Mining (ICDM) 2016, Barcelona, Spain. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Jinhong Jung, Woojeong Jin, Lee Sael, and U Kang, "Personalized Ranking in Signed Networks using Signed Random Walk with Restart", IEEE International Conference on Data Mining (ICDM) 2016, Barcelona, Spain. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Ha-Myung Park, Sung-Hyon Myaeng, and U Kang, "PTE: Enumerating Trillion Triangles On Distributed System", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2016, San Francisco, USA. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Minsoo Jung, Sunmin Lee, Yongsub Lim, U Kang, "FURL: Fixed-memory and Uncertainty Reducing Local Triangle Counting for Graph Streams", arXiv: 1611.06615 [cs.DS], 26 November 2016. [BIBTEX] [PDF]
  • Min-Hee Jang, Christos Faloutsos, Sang-Wook Kim, U Kang, and Jiwoon Ha, "PIN-TRUST: Fast Trust Propagation Exploiting Positive, Implicit, and Negative Information", ACM International Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, Indiana, USA. [BIBTEX] [PDF]
  • Hugo Gualdron, Robson Cordeiro, Jose Rodrigeus-Jr, Duen Horng (Polo) Chau, Minsuk Kahng, and U Kang., "M-Flash: Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model", European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML-PKDD) 2016, Riva Del Garda, Italy. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Yongsub Lim, Won-Jo Lee, Ho-Jin Choi, and U Kang, "MTP: Discovering High Quality Partitions in Real World Graphs", World Wide Web Journal [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Jinhong Jung, Kijung Shin, Lee Sael, and U Kang, "Random Walk with Restart on Large Graphs Using Block Elimination", ACM Transactions on Database Systems (TODS), vol. 41, issue 2, pp. 12:1-12:43, June 2016. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • ByungSoo Jeon, Inah Jeon, and U Kang, "TeGViz: Distributed Tera-Scale Graph Generation and Visualization", IEEE International Conference on Data Mining (ICDM) 2015, Atlantic City, USA. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Kijung Shin, Jinhong Jung, Lee Sael, and U Kang, "BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs", ACM International Conference on Management of Data (SIGMOD) 2015, Melbourne, Australia [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos, "Summarizing and understanding large graphs", Statistical Analysis and Data Mining, doi: 10.1002/sam.11267, 18 May 2015. [BIBTEX] [PDF]
  • Yongsub Lim , Won-Jo Lee , Ho-Jin Choi, and U Kang, "Discovering Large Subsets with High Quality Partitions in Real World Graphs", Second International Conference on Big Data and Smart Computing (BigComp) 2015, Jeju, korea. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Ha-Myung Park, Francesco Silvestri, U Kang, Rasmus Pagh, "MapReduce Triangle Enumeration With Guarantees", 23rd ACM International Conference on Information and Knowledge Management (CIKM) 2014, Shaghai, China [BIBTEX] [HOMEPAGE (CODE, DATASET)] [PDF]
  • Yongsub Lim, U Kang, and Christos Faloutsos, "SlashBurn: Graph Compression and Mining beyond Caveman Communities", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 26, no. 12, pp. 3077-3089, April 2014. [BIBTEX] [CODE] [PDF]
  • Jungeun Kim, Minsoo Choy, Daehoon Kim, and U Kang, "Link Prediction Based on Generalized Cluster Information", 23rd International World Wide Web Conference (WWW) 2014, Seoul, Korea. [BIBTEX] [PDF]
  • U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos, "Net-Ray: Visualizing and Mining Billion-Scale Graphs", Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2014, Tainan, Taiwan. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos, "VoG: Summarizing and Understanding Large Graphs", SIAM International Conference on Data Mining (SDM) 2014, Philadelphia, Pennsylvania, USA. [BIBTEX] [PDF]
  • Zhiyuan Lin, Nan Cao, Hanghang Tong, Fei Wang, U Kang, and Duen Horng Chau, "Interactive Multi-resolution Exploration of Million Node Graphs", IEEE VIS 2013, Atlanta, Georgia, USA. [BIBTEX] [PDF]
  • U Kang, Martial Hebert, and Soonyong Park, "Fast and Scalable Approximate Spectral Graph Matching for Correspondence Problems", Information Sciences, 2013. [BIBTEX] [PDF]
  • U Kang and Christos Faloutsos, "Big graph mining: algorithms and discoveries", ACM SIGKDD Explorations Newsletter Volume 14 Issue 2, December 2012. pp. 29-36. [BIBTEX] [PDF]
  • U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos, "GBASE: An Efficient Analysis Platform for Large Graphs", VLDB Journal, 2012. [BIBTEX] [PDF]
  • U Kang, Hanghang Tong, and Jimeng Sun, "Fast Random Walk Graph Kernel", SIAM International Conference on Data Mining (SDM) 2012, Anaheim, California, USA. (acceptance rate 27 %) [BIBTEX] [PDF]
  • U Kang and Christos Faloutsos, "Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining", IEEE International Conference on Data Mining (ICDM) 2011, Vancouver, Canada. (acceptance rate 12.2 %) [BIBTEX] [CODE] [PDF]
  • U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos, "GBASE: A Scalable and General Graph Management System", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2011, San Diego, CA, USA. (acceptance rate 17.5 %) [BIBTEX] [PDF]
  • U Kang, Spiros Papadimitriou, Jimeng Sun, and Hanghang Tong, "Centralities in Large Networks: Algorithms and Observations", SIAM International Conference on Data Mining (SDM) 2011, Mesa, Arizona, USA. (acceptance rate 25.1 %) [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec, "HADI: Mining Radii of Large Graphs", ACM Transactions on Knowledge Discovery from Data (TKDD), 2011. [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos, "PEGASUS: Mining Peta-Scale Graphs", Knowledge and Information Systems (KAIS), Springer, 2011. [BIBTEX] [PDF]
  • U Kang, Mary McGlohon, Leman Akoglu, and Christos Faloutsos, "Patterns on the Connected Components of Terabyte-Scale Graphs", IEEE International Conference on Data Mining (ICDM) 2010, Sydney, Australia. (acceptance rate 19.4 %) [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec, "Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations", SIAM International Conference on Data Mining (SDM) 2010, Columbus, Ohio, USA. (acceptance rate 23.4 %) [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos, "PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations", IEEE International Conference on Data Mining (ICDM) 2009, Miami, Florida, USA. (acceptance rate 8.9 %) [BIBTEX] [PDF] [PEGASUS HOMEPAGE]
  • Charalampos E. Tsourakakis, U Kang, Gary Miller, and Christos Faloutsos, "DOULION: Counting Triangles in Massive Graphs with a Coin", ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009, Paris, France. (acceptance rate 20 %) [BIBTEX] [PDF]