Paper 1

A Parallel Quasi-identifier Discovery Scheme for Dependable Data Anonymisation

Authors: Nikolai J. Podlesny, Anne V. D. M. Kayem and Christoph Meinel

Volume 50 (2021)

Abstract

Fluent creation of opportunity-based short-term Collaborative Networks (CNs) among organizations or individuals requires the availability of a variety of up-to-date information. A pre-established properly administrated strategic- alliance Collaborative Network (CN) can act as the breeding environment for creation/operation of opportunity-based CNs, and effectively addressing the complexity, dynamism, and scalability of their actors and domains. Administration of these environments however requires effective set of Quasi-identifiers (QIDs) are attribute combinations that can be used to discover hidden personal identifying information from an anonymised dataset. Typically, the information drawn from such QIDs can then be combined with more publicly accessible datasets to discover sensitive information (e.g. medical conditions, financial status, criminal history, …). Research on data anonymisation has therefore proposed various algorithms to discover and transform quasi-identifiers efficiently to prevent re-identification. However, all existing algorithms are inefficient and fail to prevent re-identification attacks on large real-world high dimensional datasets successfully. This paper presents a quasi-identifier discovery algorithm that combines parallelism with an efficient search technique to find all minimal quasi-identifiers in a given dataset. As a further step, we present an adversary model based on the enumeration problem of discovering unique column combinations in a dataset. We demonstrate that our quasi-identifier discovery algorithm is secure to re-identification attacks based on this adversarial model, even in the presence of large high-dimensional datasets that change dynamically. Our empirical results show that our algorithm not only scales well to large high-dimensional datasets but exploits its parallelisability on GPU (Graphics Processing Unit) architectures to prevent re-identification even in the presence of a powerful adversary equipped with similar high-performance computing processing power. Furthermore, our results show that the proposed GPU algorithm offers up to 100x times speedup over the algorithm’s CPU version.