Paper 2

Syntactic Anonymisation of Shared Datasets in Resource Constrained Environments

Authors: Anne V. D. M. Kayem, C. T. Vester, Christoph Meinel

Volume 38 (2018)

Abstract

Resource constrained environments (RCEs) describe remote or rural developing world regions where missing specialised expertise, and computational processing power hinders data analytics operations. Outsourcing to third-party data analytics service providers offers a cost-effective management solution. However, a necessary pre-processing step is to anonymise the data before it is shared, to protect against privacy violations. Syntactic anonymisation algorithms (k-anonymisation, l-diversity, and t-closeness) are an attractive solution for RCEs because the generated data is not use case specific. These algorithms have however been shown to be NP-Hard, and as such need to be re-factored to run efficiently with limited processing power. In previous work [23], we presented a method of extending the standard k-anonymization and l-diversity algorithms, to satisfy both data utility and privacy. We used a multi-objective optimization scheme to minimise information loss and maximize privacy. Our results showed that the extended l-diverse algorithm incurs higher information losses than the extended k-anonymity algorithm, but offers better privacy in terms of protection against inferential disclosure. The additional information loss (7%) was negligible, and did not negatively affect data utility. As a further step, in this paper, we extend this result with a modified t-closeness algorithm based on the notion of clustering. The aim of this is to provide a performance-efficient algorithm that maintains the low information loss levels of our extended k-anonymisation and l-diversity algorithms, but also provides protection against skewness and similarity attacks.

Keywords: Data anonymisation, k-anonymity, l-diversity, t-closeness, Clustering, Privacy, Information loss.