Paper 2

Differential Erasure Codes for Efficient Archival of Versioned Data in Cloud Storage Systems

Authors: J. Harshan, Anwitaman Datta, and Frédérique Oggier

Volume 30 (2016)

Abstract

In this paper, we study the problem of storing an archive of versioned data in a reliable and efficient manner. The proposed tech- nique is relevant in cloud settings, where, because of the huge volume of data to be stored, distributed (scale-out) storage systems deploying era- sure codes for fault tolerance is typical. However existing erasure coding techniques do not leverage redundancy of information across multiple versions of a le. We propose a new technique called differential erasure coding (DEC) where the differences (deltas) between subsequent versions are stored rather than the whole objects, akin to a typical delta encoding technique. However, unlike delta encoding techniques, DEC opportunistically exploits the sparsity (i.e., when the differences between two successive versions have few non-zero entries) in the updates to store the deltas using sparse sampling techniques applied with erasure coding. We first show that DEC provides significant savings in the storage size for versioned data whenever the update patterns are characterized by in- place alterations. Subsequently, we propose a practical DEC framework so as to reap storage size benefits against not just in-place alterations but also real-world update patterns such as insertions and deletions that alter the overall data sizes. We conduct experiments with several synthetic and practical workloads to demonstrate that the practical variant of DEC provides significant reductions in storage overhead.