연구분야

그래프와 텐서

어떻게 그래프나 여러 차원을 가지는 데이터를 분석할 수 있을까? 우리는 그래프와 텐서에 대한 모델, 알고리즘과 시스템을 개발한다.

그래프

페이스북과 트위터 네트워크와 같은 대용량 그래프에서 어떻게 패턴이나 이상 신호를 찾을 수 있을까? 이러한 그래프에서 어떻게 노드를 분류하고 순위를 매길 수 있을까? 그래프는 우리 생활 어디에서나 찾아볼 수 있으며 그 예로는 소셜 네트워크, 인터넷, 생물학 네트워크 등이 있다. 이 프로젝트에서는 대용량 그래프들을 분석하기 위한 알고리즘과 시스템을 개발하고 응용 분야들을 연구한다.

그래프 알고리즘과 시스템:

  • 랭킹과 관계추론을 위한 Random walk with restart 개발한다.
  • 그래프 기계학습: Graph Convolutional Network (GCN) and Graph Attention Network (GAT)를 포함하여 최근 작품을 능가하는 최첨단 노드 분류 및 표현 학습 방법을 개발한다.
  • 확장 가능한 그래프 마이닝: 수십억 개 이상의 간선을 가지는 매우 큰 그래프를 처리하기 위해 대형 그래프 처리 엔진 및 관련 알고리즘을 개발한다.

그래프 응용:

데이터 마이닝 연구실에서는 다음과 같은 응용 분야에 대하여 연구한다.

  • 시간 변화 그래프: 실세계 그래프의 시간 경향을 분석한다.
  • 링크 예측: 미래에 생성될 링크들을 예측한다.
  • 커뮤니티 탐색: 다른 그룹과는 느슨하게 연결되어 있지만 그룹 내에선 서로 긴밀하게 연결된 커뮤니티를 찾는다.
  • 이상 신호 탐색: 그래프에서 이상한 사용자와 행동을 찾는다.
  • 시각화 및 의미 찾기: 그래프를 요약, 이해하고 그래프에서의 의미를 찾는다.\r\n

그래프 소프트웨어

  • Pegasus: 페타(Peta) 규모의 그래프 마이닝 시스템

텐서

네트워크 침입 로그 (출발지 IP, 목적지 IP, 포트번호, 기록 시간) 같은 대용량의 실세계 데이터에서 여러 특성들을 동시에 고려하면서 어떻게 유용한 패턴과 이상 신호를 찾을 수 있을까? 텐서는 다차원 데이터 모델링에 적합하며 소셜 네트워크, 웹 데이터, 네트워크 트래픽, 그 외 다른 환경들에서의 분석을 위해 널리 사용되고 있다. 그러나 현재의 텐서 분해 기법은 실제 데이터에서 나타나는 수백만 그리고 수십억의 행, 열, `fiber`를 포함하는 텐서에 적용할 수 없다.

텐서 알고리즘과 시스템:

데이터 마이닝 연구실에서는 고확장성 텐서 분석 알고리즘을 설계하고 개발한다. 프로젝트의 목표는 성능 증가를 위하여 실세계 텐서의 희소성을 완벽히 이용하는 알고리즘을 만드는 것이다. 지원하는 알고리즘은 PARAFAC 분해, 결합 행렬-텐서 분해, Tucker 분해, 음수 미포함 텐서 분해를 포함한다.

텐서 응용:

데이터 마이닝 연구실에서는 다음과 같은 응용 분야에 대하여 연구한다.

  • 시간 변화 그래프의 경향 분석
  • 네트워크 보안 (즉, 이상 사용자 또는 행동을 감지)
  • 헬스케어 데이터 분석 (예: fMRI)
  • 지식 기반 분석 (예: FreeBase, Yago)

텐서 소프트웨어

  • BigTensor: 분산 플랫폼에서의 고확장성 텐서 분석 툴.

연구실적

  • Jun-Gi Jang and U Kang, "D-Tucker: Fast and Memory-Efficient Tucker Decomposition for Dense Tensors", 36th IEEE International Conference on Data Engineering (ICDE) 2020, Dallas, Texas, USA.
  • Chiwan Park, Ha-Myung Park, and U Kang, "FlexGraph: Flexible partitioning and storage for scalable graph mining.", PLOS ONE (PLOS ONE), 2020.
  • Jinhong Jung, Ha-Myung Park, and U Kang, "BalanSiNG: Fast and Scalable Generation of Realistic Signed Networks", 23rd International Conference on Extending Database Technology (EDBT) 2020, Copenhagen, Denmark. [PDF]
  • Sejoon Oh, Namyong Park, Jun-Gi Jang, Lee Sael, and U Kang, "High-Performance Tucker Factorization on Heterogeneous Platforms.", IEEE Transactions on Parallel and Distributed Systems, 2019.
  • Yongsub Lim, Injae Yu, Dongmin Seo, U Kang, and Lee Sael., "PS-MCL: parallel shotgun coarsened Markov clustering of protein interaction networks.", (BMC Bioinformatics), vol. 20, no. 381, 2019.
  • Donjin Choi, Jun-Gi Jang, and U Kang, "S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization.", PLOS ONE (PLOS ONE), 2019.
  • Jaemin Yoo, U Kang, Mauro Scanagatta, Giorgio Corani, and Marco Zaffalon, "Sampling Subgraphs with Guaranteed Treewidth for Accurate and Efficient Graphical Inference", The 13th ACM International WSDM Conference (WSDM) 2020, Houston, USA. [BIBTEX] [PDF]
  • Jamin Yoo, Hyunsik Jeon, and U Kang., "Belief Propagation Network for Hard Inductive Semi-supervised Learning", 28th International Joint Conference on Artificial Intelligence (IJCAI) 2019, Macao, China. [BIBTEX] [HOMEPAGE] [PDF]
  • Jun-gi Jang, Dongjin Choi, Jinhong Jung, and U Kang., "Zoom-SVD: Fast and Memory Efficient Method for Extracting Key Patterns in an Arbitrary Time Range", ACM International Conference on Information and Knowledge Management (CIKM) 2018, Lingotto, Turin, Italy. [BIBTEX] [HOMEPAGE] [PDF]
  • Ha-Myung Park, Francesco Silvestri, Rasmus Pagh, Chin-wan Chung, Sung-Hyon Myaeng, and U Kang, "Enumerating Trillion Subgraphs On Distributed Systems", ACM Transactions on Knowledge Discovery from Data (TKDD), 2018. [PDF] [BIBTEX] [HOMEPAGE (CODE , DATA)]
  • Sejoon Oh, Namyong Park, Lee Sael, and U Kang., "Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries", 34th IEEE International Conference on Data Engineering (ICDE) 2018, Paris, France. [BIBTEX] [HOMEPAGE] [PDF]
  • Minji Yoon, Jinhong Jung, and U Kang, "TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs", 34th IEEE International Conference on Data Engineering (ICDE) 2018, Paris, France. [BIBTEX] [HOMEPAGE] [PDF]
  • Junghwan Kim, Haekyu Park, Ji-Eun Lee, and U Kang, "SIDE: Representation Learning in Signed Directed Networks", The Web Conference (WWW) 2018, Lyon, France. [BIBTEX] [HOMEPAGE] [PDF]
  • Minji Yoon, Woojeong Jin, and U Kang, "Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees", The Web Conference (WWW) 2018, Lyon, France. [BIBTEX] [HOMEPAGE] [PDF]
  • Ha-Myung Park, Chiwan Park, and U Kang, "PegasusN: A Scalable and Versatile Graph Mining System", Thirty-Second AAAI Conference on Artificial Intelligence (AAAI) 2018, New Orleans, Lousiana, USA. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Haekyu Park, Jinhong Jung, and U Kang, "A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems", IEEE International Conference on Big Data (BigData) 2017, Boston, MA, USA. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Yongsub Lim, Minsoo Jung, and U Kang, "Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs", ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 12, issue 1, Feburuary 2018. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Namyong Park, Sejoon Oh, and U Kang, "Fast and Scalable Distributed Boolean Tensor Factorization", IEEE International Conference on Data Engineering (ICDE) 2017, San Diego, CA, USA. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Kijung Shin, Lee Sael, and U Kang, "Fully Scalable Methods for Distributed Tensor Factorization", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 29, no. 1, pp. 100-113, Jan. 1 2017. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Jinhong Jung, Namyong Park, Lee Sael, and U Kang, "BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart", ACM International Conference on Management of Data (SIGMOD) 2017, Raleigh, North Carolina, USA. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Ha-Myung Park, Namyong Park, Sung-Hyon Myaeng, and U Kang, "Partition Aware Connected Component Computation in Distributed Systems", IEEE International Conference on Data Mining (ICDM) 2016, Barcelona, Spain. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Jinhong Jung, Woojeong Jin, Lee Sael, and U Kang, "Personalized Ranking in Signed Networks using Signed Random Walk with Restart", IEEE International Conference on Data Mining (ICDM) 2016, Barcelona, Spain. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Namyong Park, Byungsoo Jeon, Jungwoo Lee, and U Kang, "BIGtensor: Mining Billion-Scale Tensor Made Easy", ACM International Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, Indiana, USA. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Ha-Myung Park, Sung-Hyon Myaeng, and U Kang, "PTE: Enumerating Trillion Triangles On Distributed System", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2016, San Francisco, USA. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Minsoo Jung, Sunmin Lee, Yongsub Lim, U Kang, "FURL: Fixed-memory and Uncertainty Reducing Local Triangle Counting for Graph Streams", arXiv: 1611.06615 [cs.DS], 26 November 2016. [BIBTEX] [PDF]
  • Min-Hee Jang, Christos Faloutsos, Sang-Wook Kim, U Kang, and Jiwoon Ha, "PIN-TRUST: Fast Trust Propagation Exploiting Positive, Implicit, and Negative Information", ACM International Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, Indiana, USA. [BIBTEX] [PDF]
  • Hugo Gualdron, Robson Cordeiro, Jose Rodrigeus-Jr, Duen Horng (Polo) Chau, Minsuk Kahng, and U Kang., "M-Flash: Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model", European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML-PKDD) 2016, Riva Del Garda, Italy. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Yongsub Lim, Won-Jo Lee, Ho-Jin Choi, and U Kang, "MTP: Discovering High Quality Partitions in Real World Graphs", World Wide Web Journal [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Inah Jeon, Evangelos E. Papalexakis, Christos Faloutsos, Lee Sael, and U Kang, "Mining Billion-Scale Tensors: Algorithm and Discoveries", VLDB Journal, vol. 25, issue 4, pp. 519-544, August 2016. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • ByungSoo Jeon, Inah Jeon, Sael Lee, U Kang, "SCouT: Scalable Coupled Matrix-Tensor Factorization-Algorithms and Discoveries", 32nd IEEE International Conference on Data Engineering (ICDE) 2016, Helsinki, Finland. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Jinhong Jung, Kijung Shin, Lee Sael, and U Kang, "Random Walk with Restart on Large Graphs Using Block Elimination", ACM Transactions on Database Systems (TODS), vol. 41, issue 2, pp. 12:1-12:43, June 2016. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • ByungSoo Jeon, Inah Jeon, and U Kang, "TeGViz: Distributed Tera-Scale Graph Generation and Visualization", IEEE International Conference on Data Mining (ICDM) 2015, Atlantic City, USA. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Kijung Shin, Jinhong Jung, Lee Sael, and U Kang, "BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs", ACM International Conference on Management of Data (SIGMOD) 2015, Melbourne, Australia [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Inah Jeon, Evangelos E. Papalexakis, U Kang, and Christos Faloutsos, "HaTen2: Billion-scale Tensor Decompositions", 31st IEEE International Conference on Data Engineering (ICDE) 2015, Seoul, Korea. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF] [SUPPLEMENTARY DOCUMENT]
  • Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos, "Summarizing and understanding large graphs", Statistical Analysis and Data Mining, doi: 10.1002/sam.11267, 18 May 2015. [BIBTEX] [PDF]
  • Yongsub Lim , Won-Jo Lee , Ho-Jin Choi, and U Kang, "Discovering Large Subsets with High Quality Partitions in Real World Graphs", Second International Conference on Big Data and Smart Computing (BigComp) 2015, Jeju, korea. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • Lee Sael, Inah Jeon, and U Kang, "Scalable Tensor Mining", Big Data Research Journal, Feb. 2015. [BIBTEX] [PDF]
  • Ha-Myung Park, Francesco Silvestri, U Kang, Rasmus Pagh, "MapReduce Triangle Enumeration With Guarantees", 23rd ACM International Conference on Information and Knowledge Management (CIKM) 2014, Shaghai, China [BIBTEX] [HOMEPAGE (CODE, DATASET)] [PDF]
  • Yongsub Lim, U Kang, and Christos Faloutsos, "SlashBurn: Graph Compression and Mining beyond Caveman Communities", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 26, no. 12, pp. 3077-3089, April 2014. [BIBTEX] [CODE] [PDF]
  • Jungeun Kim, Minsoo Choy, Daehoon Kim, and U Kang, "Link Prediction Based on Generalized Cluster Information", 23rd International World Wide Web Conference (WWW) 2014, Seoul, Korea. [BIBTEX] [PDF]
  • U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos, "Net-Ray: Visualizing and Mining Billion-Scale Graphs", Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2014, Tainan, Taiwan. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos, "VoG: Summarizing and Understanding Large Graphs", SIAM International Conference on Data Mining (SDM) 2014, Philadelphia, Pennsylvania, USA. [BIBTEX] [PDF]
  • Zhiyuan Lin, Nan Cao, Hanghang Tong, Fei Wang, U Kang, and Duen Horng Chau, "Interactive Multi-resolution Exploration of Million Node Graphs", IEEE VIS 2013, Atlanta, Georgia, USA. [BIBTEX] [PDF]
  • U Kang, Martial Hebert, and Soonyong Park, "Fast and Scalable Approximate Spectral Graph Matching for Correspondence Problems", Information Sciences, 2013. [BIBTEX] [PDF]
  • Evangelos E. Papalexakis, U Kang, Christos Faloutsos, Nicholas D. Sidiropoulosx, and Abhay Harpale, "Large Scale Tensor Decompositions: Algorithmic Developments and Applications", Bulletin of the Technical Committee on Data Engineering, vol. 36, no. 3, September 2013. [BIBTEX] [PDF]
  • U Kang and Christos Faloutsos, "Big graph mining: algorithms and discoveries", ACM SIGKDD Explorations Newsletter Volume 14 Issue 2, December 2012. pp. 29-36. [BIBTEX] [PDF]
  • U Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos, "GigaTensor: Scaling Tensor Analysis Up By 100 Times - Algorithms and Discoveries", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2012, Beijing, China. [BIBTEX] [PDF]
  • U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos, "GBASE: An Efficient Analysis Platform for Large Graphs", VLDB Journal, 2012. [BIBTEX] [PDF]
  • U Kang, Hanghang Tong, and Jimeng Sun, "Fast Random Walk Graph Kernel", SIAM International Conference on Data Mining (SDM) 2012, Anaheim, California, USA. (acceptance rate 27 %) [BIBTEX] [PDF]
  • U Kang and Christos Faloutsos, "Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining", IEEE International Conference on Data Mining (ICDM) 2011, Vancouver, Canada. (acceptance rate 12.2 %) [BIBTEX] [CODE] [PDF]
  • U Kang, Brendan Meeder, and Christos Faloutsos, "Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation", Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2011, Shenzhen, China. (acceptance rate 9.7 %) [BIBTEX] [PDF]
  • U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos, "GBASE: A Scalable and General Graph Management System", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2011, San Diego, CA, USA. (acceptance rate 17.5 %) [BIBTEX] [PDF]
  • U Kang, Spiros Papadimitriou, Jimeng Sun, and Hanghang Tong, "Centralities in Large Networks: Algorithms and Observations", SIAM International Conference on Data Mining (SDM) 2011, Mesa, Arizona, USA. (acceptance rate 25.1 %) [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec, "HADI: Mining Radii of Large Graphs", ACM Transactions on Knowledge Discovery from Data (TKDD), 2011. [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos, "PEGASUS: Mining Peta-Scale Graphs", Knowledge and Information Systems (KAIS), Springer, 2011. [BIBTEX] [PDF]
  • U Kang, Mary McGlohon, Leman Akoglu, and Christos Faloutsos, "Patterns on the Connected Components of Terabyte-Scale Graphs", IEEE International Conference on Data Mining (ICDM) 2010, Sydney, Australia. (acceptance rate 19.4 %) [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec, "Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations", SIAM International Conference on Data Mining (SDM) 2010, Columbus, Ohio, USA. (acceptance rate 23.4 %) [BIBTEX] [PDF]
  • U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos, "PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations", IEEE International Conference on Data Mining (ICDM) 2009, Miami, Florida, USA. (acceptance rate 8.9 %) [BIBTEX] [PDF] [PEGASUS HOMEPAGE]
  • Charalampos E. Tsourakakis, U Kang, Gary Miller, and Christos Faloutsos, "DOULION: Counting Triangles in Massive Graphs with a Coin", ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009, Paris, France. (acceptance rate 20 %) [BIBTEX] [PDF]