Paper 2

Processing Exact Results forWindowed Stream Joins in a Memory-limited System: A Disk-based, adaptive Approach

Authors: Abhirup Chakraborty and Ajit Singh

Volume 7 (2012)

Abstract

We consider the problem of processing exact results for sliding window joins over data streams with limited memory. Existing approaches either, (1) deal with memory limitations by shedding loads, and therefore cannot provide exact or even highly accurate results for sliding window joins over data streams showing time-varying rate of data arrivals, or (2) suffer from large I/O overhead due to random disk flushes and disk-to-disk stages with a stream join, making the approaches inefficient to handle sliding window joins. We provide an Adaptive, Hash-partitioned Exact Window Join (AH-EWJ) algorithm incorporating disk storage as an archive. Our algorithm spills window data onto the disk on a periodic basis, refines the output result by properly retrieving the disk-resident data, maximizes output rate by employing techniques to manage the memory blocks, and continuously adjusting the allocated memory within the stream windows. The problem of managing the window blocks in memory—similar in nature to the caching issue—captures both the temporal and frequency related properties of the stream arrivals.We present a baseline algorithm called Rate-based Progressive Window Joins (RPWJ), which extends an existing algorithm to tune the performance by reducing disk I/O overhead while processing sliding window joins. We provide experimental results demonstrating the performance and effectiveness of the proposed algorithm.