Scalable Tucker Factorization for Sparse Tensors

Overview

Given sparse multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we discover latent concepts/relations and predict missing values? Tucker factorization has been widely used to solve such problems with multi-dimensional data, which are modeled as tensors. However, most Tucker factorization algorithms regard and estimate missing entries as zeros, which triggers a highly inaccurate decomposition. Moreover, few methods focusing on an accuracy exhibit limited scalability since they require huge memory and heavy computational costs while updating factor matrices. In this paper, we propose P-TUCKER, a scalable Tucker factorization method for sparse tensors. P-TUCKER performs alternating least squares with a row-wise update rule in a fully parallel way, which significantly reduces memory requirements for updating factor matrices. Furthermore, we offer two variants of P-TUCKER: a caching algorithm P-TUCKER-CACHE and an approximation algorithm P-TUCKER-APPROX, both of which accelerate the update process. Experimental results show that P-TUCKER exhibits 1.7-14.1x speed-up and 1.4-4.8x less error compared to the state-of-the-art. In addition, P-TUCKER scales near linearly with the number of observable entries in a tensor and number of threads. Thanks to P-TUCKER, we successfully discover hidden concepts and relations in a large-scale real-world tensor, while existing methods cannot reveal latent features due to their limited scalability or low accuracy.

Paper

Our proposed method P-Tucker is described in the following paper.

Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries
Sejoon Oh, Namyong Park, Lee Sael, and U Kang.
34th IEEE International Conference on Data Engineering (ICDE) 2018, Paris, France.
Samsung Humantech Paper Award (1st in Computer Science)
[PDF] [Supplementary Material]

Code

The source-code of P-Tucker is available here. [P-Tucker_ver1.4]

Datasets

Name	Order	Dimensionality	Non-zeros	Description	Download
Yahoo-music	4	1M × 625K × 133 × 24	252M	Music rating data (User, Music, Year-Month, Hour, Rating)	[Down]
MovieLens	4	138K × 27K × 21 × 24	20M	Movie rating data (User, Movie, Year, Hour, Rating)	[Down]
Sea-wave video	4	112 × 160 × 3 × 32	160K	Sea-wave video data	[Down]
"Lena" image	3	256 × 256 × 3	20K	Image data of "Lena"	[Down]
Synthetic	3~10	100~10M	~100M	Synthetic tensors	–

People

Sejoon Oh (Seoul National University)
Namyong Park (Carnegie Mellon University)
Lee Sael (Seoul National University)
U Kang (Seoul National University)

Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries

Overview

Paper

Code

Datasets

People