BIGtensor: Mining Billion-Scale Tensor Made Easy

Overview

Many real-world data are naturally represented as tensors, or multi-dimensional arrays. Tensor decomposition is an important tool to analyze tensors for various applications such as latent concept discovery, trend analysis, clustering, and anomaly detection. However, existing tools for tensor analysis do not scale well for billion-scale tensors or offer limited functionalities.
In this paper, we propose BIGtensor, a large-scale tensor mining library that tackles both of the above problems. Carefully designed for scalability, BIGtensor decomposes at least 100× larger tensors than the current state of the art. We demonstrate how BIGtensor can help users discover hidden concepts and analyze trends from large-scale tensors that are hard to be processed by the existing tools.

Paper

- BIGtensor: Mining Billion-Scale Tensor Made Easy
  Namyong Park, Byungsoo Jeon, Jungwoo Lee, U Kang.
  25th ACM International Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, United States.

Comparison

Comparison of functionalities provided by BIGtensor and other state-of-the-art tensor tools.
( P: PARAFAC, T: Tucker, PN: PARAFAC-Nonnegative, TN: Tucker-Nonnegative, C: CMTF )

Functionality BIGtensor Tensor Toolbox FlexiFaCT
Tensor Decomposition P, PN, T, TN, C P, PN, T P, PN, C
Tensor Generation Yes Yes No
Tensor-Tensor Operation Yes Yes No
Tensor Manipulation Yes Yes No
Distributed Yes No Yes

Scalability of BIGtensor and other tools. We report the mode length and the density of the largest data each tool processes using two representative tensor decomposition algorithms, PARAFAC and Tucker. Their nonnegative versions and CMTF show a similar performance. BIGtensor decomposes 100× larger data in terms of mode length than both of the tools, and also decomposes 100× denser data than the Tensor Toolbox.

Scalability Method BIG tensor Tensor Toolbox FlexiFaCT
Mode Length & Nonzeros PARAFAC ≥ 10^9 ≤ 10^7 ≤ 10^7
Tucker ≥ 10^9 ≤ 10^7 -
Density PARAFAC ≥ 10^-5 ≤ 10^-7 ≥ 10^-5
Tucker ≥ 10^-5 ≤ 10^-7 -

Code

The binary code of BIGtensor is available here.

Download BIGtensor - v1.0

Dataset

Name Structure Dimensionality Nonzero Download Description
Microsoft Academic Graph Paper - Author - Affiliation 123M × 123M × 2.7M 325M DOWN Papers and their metadata
NELL NounPhrase1 - NounPhrase2 - Context 26M × 26M × 48M 144M "Read the Web" Project
MovieLens User - Movie - YearMonth 72K × 11K × 157 10M DOWN Movie rating data
YELP User - Business - YearMonth 71K × 16K × 108 334K DOWN Business rating data
PhoneCall Source - Destination - Date 30M × 30M × 62 1B Phone call traffic data
Random I - J - K 1K~1B × 1K~1B × 1K~1B 10K~10B Synthetic random data

Screenshots

People

Namyong Park
Department of Computer Science and Engineering
Seoul National University
Byungsoo Jeon
Department of Computer Science and Engineering
Seoul National University
Jungwoo Lee
Department of Computer Science and Engineering
Seoul National University
U Kang
Department of Computer Science and Engineering
Seoul National University
Copyright © 2016, By Data Mining Laboratory, Department of Computer Science and Engineering, Seoul National University, All Rights Reserved.