S3CMTF:
Fast, Accurate, and Scalable Method for Sparse Coupled Matrix-Tensor Factorization


Overview


How can we extract hidden relations from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is an important tool for the purpose. Designing an accurate and efficient CMTF method has become more crucial as the size and dimension of real-world data are growing explosively. However, existing methods for CMTF suffer from lack of accuracy, slow running time, and limited scalability.

In this paper, we propose S3CMTF, a fast, accurate, and scalable CMTF method. In contrast to previous methods which do not support sparse tensors or do not model complicated relationships between factors, S3CMTF provides sparse Tucker factorization by carefully deriving gradient update rules. We also show that lock-free parallel SGD is useful for S3CMTF in multi-core shared memory systems. S3CMTF further boosts the performance by carefully storing intermediate computation and reusing them. We theoretically and empirically show that S3CMTF is the fastest, outperforming existing methods. Experimental results show that S3CMTF is up to 989x faster than existing methods while providing the best accuracy. S3CMTF shows linear scalability on the number of data entries and the number of cores. In addition, we apply S3CMTF to Yelp recommendation tensor data coupled with 3 additional matrices to discover interesting patterns.

Paper


Code

The source codes used in the paper are available.

Comparison

Comparison of our proposed S3CMTF and existing CMTF methods. S3CMTF outperforms the state of-the-art single machine CMTF methods in terms of time, accurracy, and speed.

Existing [Proposed]
CMTF-Tucker-ALS CMTF-OPT S3CMTF-base S3CMTF-opt
Time slow slow fast faster
Accuracy low low high high
Scalability low low high high
Memory high high lower low
Parallel no no yes yes

Dataset

NameStructureSizeNonzeroDownload
MovieLens User - Movie - Time 71K × 11K × 157 10M DOWN
Movie - Genre 11K × 20 214K
Netflix User - Movie - Time 480K × 18K × 74 100M DOWN
Movie - Yearmonth 18K × 110 2M
Yelp User - Business - Time 1M × 144K × 149 4M DOWN
User - User 1M × 1M 7M
Business - Category 144K × 1K 172M
Business - City 144K × 1K 126M

People