Billion-scale Tensor Decompositions
How can we find useful patterns and anomalies in large
scale real-world data with multiple attributes? Tensors are suitable for modeling these multidimensional
data, and widely used for the analysis of social
networks, web data, network traffic, and in many other settings.
HaTen2 is a scalable distributed algorithm of tensor decomposition for large scale tensors running on the MapReduce framework.
HaTen2 decomposes 100X larger tensors compared to existing methods.
Download HaTen2 - v1.0
The binary code of HaTen2 is available here.
Paper
- HaTen2: Billion-scale Tensor Decompositions.
Inah Jeon, Evangelos E. Papalexakis, U Kang, Christos Faloutsos.
31st IEEE International Conference on Data Engineering (
ICDE) 2015, Seoul, Korea.
- Mining Billion-Scale Tensors: Algorithms and Discoveries.
Inah Jeon, Evangelos E. Papalexakis, Christos Faloutsos, Lee Sael, U Kang.
The International Journal on Very Large Data Bases (
VLDB)
Dataset
Name | Dimensionality | Nonzero | Source | Description |
Freebase-music | 23M x 23M x 166 | 99M | Freebase Web |
Freebase RDF data. (Entity, Entity, Relation triples) |
Freebase-sampled | 38M x 38M x 532 | 139M | Freebase Web |
Freebase RDF data. (Entity, Entity, Relation triples) |
NELL | 26M x 26M x 48M | 144M | NELL Web |
Knowledgebase data (Subject, Object, Predicate triples) |
NELL-2 | 14K x 14K x 28K | 77M | NELL Web |
Knowledgebase data (Subject, Object, Predicate triples) |
Phonecall | 30M x 30M x 62 | 184M | |
Phone call history (Sender id, Receiver id, Date triples) |
DARPA1998 | 22K x 22K x 23M | 28M | DARPA Web |
Phone call history (Source IP, Destination IP, Time triples) |
People
Inah Jeon
Future IT R&D Lab
LG Electronics
Evangelos E. Papalexakis
Department of Computer Science
Carnegie Mellon University
Christos Faloutsos
Department of Computer Science
Carnegie Mellon University
Lee Sael
Department of Computer Science
SUNY
U Kang
Department of Computer Science and Engineering
Seoul National University