Research

Swift Stream Miner

How can we analyze and predict high velocity data streams accurately? How can we discover interesting patterns in real time? How can we detect anomalies as quickly as possible? These questions are highly related to data stream mining, and are receiving growing interests especially in recent days. Streaming data is very common in the real world: IoT data, sensor data from cars, environment sensing data, co-purchasing products in e-commerce sites, financial transaction, messages in social networks, click streams in the web, network traffic, etc. One important characteristic of this kind of data is that it is generated continuously at a very high speed. As a result, a stream mining stack requires to satisfy the following desiderata.

  • Space Efficiency: the information from the data streams are too large to store in memory or disk. Thus, minimizing the required space is important.
  • Real Time Processing: the stream is read only once, and thus real time decision and processing is crucial.
  • Fast Query Answering: response to a query should be very fast.

In this project, we design and develop Swift Stream Miner, a fast and efficient stream mining software stack to analyze a high velocity data stream. Especially, our focus is twofold.

  • Data Stream Intelligence: we assume general N-dimensional data continuously arrive on the stream, where the dimension of data might change frequently. We provide essential data stream analysis functionality to find and predict patterns, trends, and anomalies in the general data stream.
  • Graph Stream Intelligence: we assume graph data continuously arrive on the stream. The graph data may be given in the adjacency list format or sparse adjacency matrix format. We extract essential features from graphs, and use them to find and predict patterns, trend, and anomalies from real time graph stream.

Our Swift Stream Miner will run on various platforms, including standalone environment and distributed platforms.

Applications:

Swift Stream Miner will be used for various applications including:

  • IoT data (thermostat, smart meter, etc.) monitoring
  • Healthcare monitoring and prediction
  • Structure (building, bridge, water pipe, etc.) monitoring

Publication

  • Jun-gi Jang, Dongjin Choi, Jinhong Jung, and U Kang, "Zoom-SVD: Fast and Memory Efficient Method for Extracting Key Patterns in an Arbitrary Time Range", ACM International Conference on Information and Knowledge Management (CIKM) 2018, Lingotto, Turin, Italy. [BIBTEX] [HOMEPAGE] [PDF]
  • Yongsub Lim, Minsoo Jung, and U Kang, "Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs", ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 12, issue 1, Feburuary 2018. [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Yongsub Lim and U Kang, "Time-weighted Counting for Recently Frequent Pattern Mining in Data Streams", Knowledge and Information Systems (KAIS). doi:10.1007/s10115-017-1045-1 [BIBTEX] [PDF]
  • Minsoo Jung, Sunmin Lee, Yongsub Lim, U Kang, "FURL: Fixed-memory and Uncertainty Reducing Local Triangle Counting for Graph Streams", arXiv: 1611.06615 [cs.DS], 26 November 2016. [BIBTEX] [PDF]
  • Yongsub Lim, and U Kang, "MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams", 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2015, Sydney, Australia [BIBTEX] [HOMEPAGE (CODE, DATA)] [PDF]
  • Yongsub Lim, Jihoon Choi, and U Kang, "Fast, Accurate, and Space-efficient Tracking of Time-weighted Frequent Items from Data Streams", 23rd ACM International Conference on Information and Knowledge Management (CIKM) 2014,Shaghai, China [BIBTEX] [PDF]
  • Dongyeop Kang, DongGyun Han, NaHea Park, Sangtae Kim, U Kang, and Soobin Lee, "Eventera: Real-time Event Recommendation System from Massive Heterogeneous Online Media", IEEE International Conference on Data Mining (ICDM) 2014, Shenzhen, China. [BIBTEX] [PDF]