Given a dense tensor, how can we efficiently discover hidden relations and patterns in static and online streaming settings? Tucker decomposition is a fundamental tool to analyze multidimensional arrays in the form of tensors. However, existing Tucker decomposition methods in both static and online streaming settings have limitations of efficiency since they directly deal with large dense tensors for the result of Tucker decomposition. In a static setting, although few static methods have tried to reduce their time cost by sampling tensors, sketching tensors, and efficient matrix operations, there remains a need for an efficient method. Moreover, streaming versions of Tucker decomposition are still time-consuming to deal with newly arrived tensors.
We propose D-Tucker and D-TuckerO, efficient Tucker decomposition methods for large dense tensors in static and online streaming settings, respectively. By decomposing a given large dense tensor with randomized singular value decomposition, avoiding the reconstruction from SVD results, and carefully determining the order of operations, D-Tucker and D-TuckerO efficiently obtain factor matrices and core tensor. Experimental results show that D-Tucker achieves up to 38.4x faster running times, and requires up to 17.2x less space than existing methods while having similar accuracy. Furthermore, D-TuckerO is up to 4.7x faster than existing streaming methods for each newly arrived tensor while its running time is proportional to the size of the newly arrived tensor, not the accumulated tensor.
D-Tucker and D-TuckerO is described in the following paper:
The paper is accepted to ICDE2020 as a short paper.
[Paper] [Bibtex]The paper is accepted to TKDD journal, 2023.
[Paper] [Bibtex]Name | #Order | #Dimensionality | Description | Link | Download |
---|---|---|---|---|---|
Brainq | 3 | (360,21764,9) | fMRI dataset. consist of (word,voxel, person; measurement) | Brainq | Download |
Boats | 3 | (320,240,7000) | gray scale videos. consist of (height,width,frame; value) | Boats | Download |
Air Quality | 3 | (30648, 376, 6) | air pollutants information in Korea. consists of (timestamp in hour, location, atmospheric pollutants; measurement) | Korean Air Quality | Download |
HSI | 4 | (1021,1340,33,8) | hyperspectral images of natural scenes. consist of (number of images, spatial dimension (x), spatial dimension (y), spectral dimension; value) | HSI | Download |
Name | #Order | #Dimensionality | Description | Link | Download |
---|---|---|---|---|---|
Stock | 3 | (3028,4,600) | Korea Stock dataset. consist of (stock, feature, day; measurement) | Stock | Download |
Walking Video | 3 | (1080, 1920, 600) | gray scale videos. consist of (height,width,frame; value) | Walking Video | Download |
FMA | 3 | (7994, 1025, 600) | a song dataset whose form is (song, frequency, second; value) | FMA | Download |
Traffic | 3 | (1084,96, 600) | traffic volume dataset. | Traffic | Download |