Overview
Given sparse multi-dimensional data (e.g., (user,
movie, time; rating) for movie recommendations), how can we
discover latent concepts/relations and predict missing values?
Tucker factorization has been widely used to solve such problems
with multi-dimensional data, which are modeled as tensors. However,
most Tucker factorization algorithms regard and estimate
missing entries as zeros, which triggers a highly inaccurate
decomposition. Moreover, few methods focusing on an accuracy
exhibit limited scalability since they require huge memory and
heavy computational costs while updating factor matrices.
In this paper, we propose P-TUCKER, a scalable Tucker
factorization method for sparse tensors. P-TUCKER performs
alternating least squares with a row-wise update rule in a fully
parallel way, which significantly reduces memory requirements
for updating factor matrices. Furthermore, we offer two variants
of P-TUCKER: a caching algorithm P-TUCKER-CACHE and an
approximation algorithm P-TUCKER-APPROX, both of which
accelerate the update process. Experimental results show that
P-TUCKER exhibits 1.7-14.1x speed-up and 1.4-4.8x less error
compared to the state-of-the-art. In addition, P-TUCKER scales
near linearly with the number of observable entries in a tensor
and number of threads. Thanks to P-TUCKER, we successfully
discover hidden concepts and relations in a large-scale real-world
tensor, while existing methods cannot reveal latent features due
to their limited scalability or low accuracy.
Paper
Our proposed method P-Tucker is described in the following paper.
- Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries
Sejoon Oh, Namyong Park, Lee Sael, and U Kang.
34th IEEE International Conference on Data Engineering (ICDE) 2018, Paris, France.
Samsung Humantech Paper Award (1st in Computer Science)
[PDF] [Supplementary Material]
Code
Datasets
People