Projects

Big Tensor Mining

How can we find useful patterns and anomalies in large scale real-world data, such as network intrusion logs with (source-ip, target-ip, portnumber, timestamp), with multiple attributes? Tensors are suitable for modeling these multi-dimensional data, and widely used for the analysis of social networks, web data, network traffic, and in many other settings. However, current tensor decomposition methods do not scale for tensors with millions and billions of rows, columns and 'fibers', that often appear in real datasets.

In this project, we design and develop large scale tensor analysis algorithms. Our goal is to design algorithms so that the sparsity of real world tensors are fully exploited to boost the performance. The supported algorithms include PARAFAC decomposition, coupled matrix-tensor decomposition, Tucker decomposition, and nonnegative tensor decompositions.

Applications:

Our proposed tools analyze various real world matrix or tensor data with the following applications.

  • Trend analysis of time evolving graphs
  • Network security (i.e., detect anomalous users or activities)
  • Healthcare data (e.g. fMRI) analysis
  • Knowledge base (e.g., FreeBase, Yago) analysis

Software

  • BigTensor: large scale tensor analysis tool on distributed platforms.

Publication

  • Jun-gi Jang, Dongjin Choi, Jinhong Jung, and U Kang., "Zoom-SVD: Fast and Memory Efficient Method for Extracting Key Patterns in an Arbitrary Time Range", ACM International Conference on Information and Knowledge Management (CIKM) 2018, Lingotto, Turin, Italy. [PDF]
  • Sejoon Oh, Namyong Park, Lee Sael, and U Kang., "Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries", 34th IEEE International Conference on Data Engineering (ICDE) 2018, Paris, France. [PDF]
  • Namyong Park, Sejoon Oh, and U Kang, "Fast and Scalable Distributed Boolean Tensor Factorization", IEEE International Conference on Data Engineering (ICDE) 2017, San Diego, CA, USA. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Kijung Shin, Lee Sael, and U Kang, "Fully Scalable Methods for Distributed Tensor Factorization", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 29, no. 1, pp. 100-113, Jan. 1 2017. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Namyong Park, Byungsoo Jeon, Jungwoo Lee, and U Kang, "BIGtensor: Mining Billion-Scale Tensor Made Easy", ACM International Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, Indiana, USA. [BIBTEX] [HOMEPAGE (CODE)] [PDF]
  • Inah Jeon, Evangelos E. Papalexakis, Christos Faloutsos, Lee Sael, and U Kang, "Mining Billion-Scale Tensors: Algorithm and Discoveries", VLDB Journal, vol. 25, issue 4, pp. 519-544, August 2016. [BIBTEX] [PDF] [HOMEPAGE (CODE, DATA)]
  • ByungSoo Jeon, Inah Jeon, Sael Lee, U Kang, "SCouT: Scalable Coupled Matrix-Tensor Factorization-Algorithms and Discoveries", 32nd IEEE International Conference on Data Engineering (ICDE) 2016, Helsinki, Finland. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Inah Jeon, Evangelos E. Papalexakis, U Kang, and Christos Faloutsos, "HaTen2: Billion-scale Tensor Decompositions", 31st IEEE International Conference on Data Engineering (ICDE) 2015, Seoul, Korea. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF] [SUPPLEMENTARY DOCUMENT]
  • Lee Sael, Inah Jeon, and U Kang, "Scalable Tensor Mining", Big Data Research Journal, Feb. 2015. [BIBTEX] [PDF]
  • Evangelos E. Papalexakis, U Kang, Christos Faloutsos, Nicholas D. Sidiropoulosx, and Abhay Harpale, "Large Scale Tensor Decompositions: Algorithmic Developments and Applications", Bulletin of the Technical Committee on Data Engineering, vol. 36, no. 3, September 2013. [BIBTEX] [PDF]
  • U Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos, "GigaTensor: Scaling Tensor Analysis Up By 100 Times - Algorithms and Discoveries", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2012, Beijing, China. [BIBTEX] [PDF]
  • U Kang, Brendan Meeder, and Christos Faloutsos, "Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation", Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2011, Shenzhen, China. (acceptance rate 9.7 %) [BIBTEX] [PDF]