Paper 2

Batch Composite Transactions in Stream Processing

Authors: K. Vidyasankar

Volume 34 (2017)

Abstract

Stream processing is about processing continuous streams of data by programs in a workflow. Continuous execution is discretized by grouping input stream tuples into batches and using one batch at a time for the execution of programs. As source input batches arrive con- tinuously, several batches may be processed in the workflow simultane- ously. Ensuring correctness of these concurrent executions is important. As in databases and several advanced applications, the transaction con- cept can be applied to regulate concurrent executions and ensure their correctness in stream processing. The rst step is de ning transactions corresponding to the executions in a meaningful way. A general require- ment in stream processing is that each batch be processed completely in the workflow. That is, all the programs triggered by the batch, directly and transitively, in the workflow must be executed successfully. Then, considering each program execution as a transaction, all the transactions involved in processing a batch can be grouped into a single batch compos- ite transaction, abbreviated as BCT, and transactional properties applied to these BCTs. This works well when a batch is processed individually and completely in isolation. However, when the batches are split, merged or overlapped along the workflow computation, the resulting BCTs will have some transactions in common and applying transactional properties for them becomes complicated. We overcome the problems by de ning nonblocking BCTs that have disjoint collections of transactions. They satisfy some properties analogous to those of the database transactions and facilitate (i) de ning correctness of concurrent executions in terms of equivalent serial executions of composite transactions and (ii) process- ing each batch either completely or not at all, and rolling back partially processed batches without a ecting the processing of other batches. We also suggest an appropriate roll back mechanism.