Static and Streaming Tucker Decomposition for Dense Tensors

Overview

Given a dense tensor, how can we efficiently discover hidden relations and patterns in static and online streaming settings? Tucker decomposition is a fundamental tool to analyze multidimensional arrays in the form of tensors. However, existing Tucker decomposition methods in both static and online streaming settings have limitations of efficiency since they directly deal with large dense tensors for the result of Tucker decomposition. In a static setting, although few static methods have tried to reduce their time cost by sampling tensors, sketching tensors, and efficient matrix operations, there remains a need for an efficient method. Moreover, streaming versions of Tucker decomposition are still time-consuming to deal with newly arrived tensors.

We propose D-Tucker and D-TuckerO, efficient Tucker decomposition methods for large dense tensors in static and online streaming settings, respectively. By decomposing a given large dense tensor with randomized singular value decomposition, avoiding the reconstruction from SVD results, and carefully determining the order of operations, D-Tucker and D-TuckerO efficiently obtain factor matrices and core tensor. Experimental results show that D-Tucker achieves up to 38.4x faster running times, and requires up to 17.2x less space than existing methods while having similar accuracy. Furthermore, D-TuckerO is up to 4.7x faster than existing streaming methods for each newly arrived tensor while its running time is proportional to the size of the newly arrived tensor, not the accumulated tensor.


Paper

D-Tucker and D-TuckerO is described in the following paper:


Code


Datasets

The following datasets are used for D-Tucker in a static setting.
Name#Order#DimensionalityDescriptionLinkDownload
Brainq3(360,21764,9) fMRI dataset. consist of (word,voxel, person; measurement) Brainq Download
Boats3(320,240,7000) gray scale videos. consist of (height,width,frame; value) Boats Download
Air Quality3(30648, 376, 6) air pollutants information in Korea. consists of (timestamp in hour, location, atmospheric pollutants; measurement) Korean Air Quality Download
HSI4(1021,1340,33,8) hyperspectral images of natural scenes. consist of (number of images, spatial dimension (x), spatial dimension (y), spectral dimension; value) HSI Download
The following datasets are used for D-TuckerO in an online streaming setting.
Name#Order#DimensionalityDescriptionLinkDownload
Stock3(3028,4,600) Korea Stock dataset. consist of (stock, feature, day; measurement) Stock Download
Walking Video3(1080, 1920, 600) gray scale videos. consist of (height,width,frame; value) Walking Video Download
FMA3(7994, 1025, 600) a song dataset whose form is (song, frequency, second; value) FMA Download
Traffic3(1084,96, 600) traffic volume dataset. Traffic Download

Update Note

2020-10-31: README file of AirQuality dataset is updated. Meta files (time dictionary, location dictionary, and pollutants dictionary) of the AirQuality dataset are added.