Paper 7 – TLDKS Journal

Anonymization of Data Sets with NULL Values

Authors: Margareta Ciglic, Johann Eder, and Christian Koncilia

Volume 24 (2016)

Abstract

Releasing, publishing or transferring microdata is restricted by the necessity to protect the privacy of data owners. K-anonymity is one of the most widespread concepts for anonymizing microdata but it does not explicitly cover NULL values which are nevertheless frequently found in microdata. We study the problem of NULL values (missing values, non-applicable attributes, etc.) for anonymization in detail, present a set of new definitions for k-anonymity explicitly considering NULL values and analyze which definition protects from which attacks. We show that an adequate treatment of missing values in microdata can be easily achieved by an extension of generalization algorithms. In particular, we show how the proposed treatment of NULL values was incorporated in the anonymization tool ANON, which implements generalization and tuple suppression with an application specific definition of information loss. With a series of experiments we show that NULL aware generalization algorithms have less information loss than standard algorithms.