BIGtensor: Mining Billion-Scale Tensor Made Easy

Overview

Many real-world data are naturally represented as tensors, or multi-dimensional arrays. Tensor decomposition is an important tool to analyze tensors for various applications such as latent concept discovery, trend analysis, clustering, and anomaly detection. However, existing tools for tensor analysis do not scale well for billion-scale tensors or offer limited functionalities.
In this paper, we propose BIGtensor, a large-scale tensor mining library that tackles both of the above problems. Carefully designed for scalability, BIGtensor decomposes at least 100× larger tensors than the current state of the art. We demonstrate how BIGtensor can help users discover hidden concepts and analyze trends from large-scale tensors that are hard to be processed by the existing tools.

Paper

- BIGtensor: Mining Billion-Scale Tensor Made Easy

Namyong Park, Byungsoo Jeon, Jungwoo Lee, U Kang.

25th ACM International Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, United States.

[PDF] [BIBTEX]

Comparison

Comparison of functionalities provided by BIGtensor and other state-of-the-art tensor tools.
( P: PARAFAC, T: Tucker, PN: PARAFAC-Nonnegative, TN: Tucker-Nonnegative, C: CMTF )

Functionality	BIGtensor	Tensor Toolbox	FlexiFaCT
Tensor Decomposition	P, PN, T, TN, C	P, PN, T	P, PN, C
Tensor Generation	Yes	Yes	No
Tensor-Tensor Operation	Yes	Yes	No
Tensor Manipulation	Yes	Yes	No
Distributed	Yes	No	Yes

Scalability of BIGtensor and other tools. We report the mode length and the density of the largest data each tool processes using two representative tensor decomposition algorithms, PARAFAC and Tucker. Their nonnegative versions and CMTF show a similar performance. BIGtensor decomposes 100× larger data in terms of mode length than both of the tools, and also decomposes 100× denser data than the Tensor Toolbox.

Scalability	Method	BIG tensor	Tensor Toolbox	FlexiFaCT
Mode Length & Nonzeros	PARAFAC	≥ 10^9	≤ 10^7	≤ 10^7
Mode Length & Nonzeros	Tucker	≥ 10^9	≤ 10^7	-
Density	PARAFAC	≥ 10^-5	≤ 10^-7	≥ 10^-5
Density	Tucker	≥ 10^-5	≤ 10^-7	-

Code

The binary code of BIGtensor is available here.

Download BIGtensor - v1.0

Dataset

Name	Structure	Dimensionality	Nonzero	Download	Description
Microsoft Academic Graph	Paper - Author - Affiliation	123M × 123M × 2.7M	325M	DOWN	Papers and their metadata
NELL	NounPhrase1 - NounPhrase2 - Context	26M × 26M × 48M	144M		"Read the Web" Project
MovieLens	User - Movie - YearMonth	72K × 11K × 157	10M	DOWN	Movie rating data
YELP	User - Business - YearMonth	71K × 16K × 108	334K	DOWN	Business rating data
PhoneCall	Source - Destination - Date	30M × 30M × 62	1B		Phone call traffic data
Random	I - J - K	1K~1B × 1K~1B × 1K~1B	10K~10B		Synthetic random data

Screenshots

People

Namyong Park

Department of Computer Science and Engineering
Seoul National University

Byungsoo Jeon

Department of Computer Science and Engineering
Seoul National University

Jungwoo Lee

Department of Computer Science and Engineering
Seoul National University

U Kang

Department of Computer Science and Engineering
Seoul National University