CTD:
Fast, Accurate, and Interpretable Method for Static and Dynamic Tensor Decompositions


Overview


How can we find patterns and anomalies in a tensor, or multi-dimensional array, in an efficient and directly interpretable way? How can we do this in an online environment, where a new tensor arrives each time step? Finding patterns and anomalies in a tensor is a crucial problem with many applications, including building safety monitoring, patient health monitoring, cyber security, terrorist detection, and fake user detection in social networks. Standard PARAFAC and Tucker decomposition results are not directly interpretable. Although a few sampling-based methods have previously been proposed towards better interpretability, they need to be made faster, more memory efficient, and more accurate.

In this paper, we propose CTD, a fast, accurate, and directly interpretable tensor decomposition method based on sampling. CTD-S, the static version of CTD, provably guarantees a high accuracy that is up to 11x more accurate than that of the state-of-the-art method experimentally. Also, CTD-S is made up to 2.3x faster, and up to 24x more memory-efficient than the state-of-the-art method by removing redundancy. CTD-D, the dynamic version of CTD, is the first interpretable dynamic tensor decomposition method ever proposed. Also, it is made up to 82x faster than already fast CTD-S by exploiting factors at previous time step and by reordering operations. With CTD, we demonstrate how the results can be effectively interpreted in the online distributed denial of service (DDoS) attack detection and online troll detection.

Paper


Code

The source codes used in the paper are available.

Comparison

Comparison of our proposed CTD and the existing tensor-CUR. The static method CTD-S outperforms the state of-the-art tensor-CUR in terms of time, memory usage, and accuracy. The dynamic method CTD-D is the fastest.

Existing [Proposed]
Tensor-CUR CTD-S CTD-D
Interpretability
Time Fast Faster Fastest
Memory usage Low Lower Low
Accuracy Low High High
Online

Dataset

NameStructureSizeNonzeroDownload
Facebook-wall User 1 - User 2 - Time 63,891 × 63,890 × 1,504 738,485 DOWN
Facebook-wall (syntehtic) User 1 - User 2 - Time 63,891 × 63,890 × 1,504 1,169,656 DOWN
Hyperspectral Image X - Y - Frequency 538 × 323 × 148 25,715,854 DOWN
Infectious Person - Person - Time 407 × 410 × 1,392 17,298 DOWN
Hypertext 2009 Attendee - Attendee - Time 112 × 113 × 5,246 20,818 DOWN
Haggle Person - Person - Time 77 × 274 × 1,567 27,972 DOWN
CAIDA Source IP - Destination IP - Time 189 × 189 × 1,000 20,511 DOWN
CAIDA (synthetic) Source IP - Destination IP - Time 189 × 189 × 1,000 46,102 DOWN

People