Paper 2
Syntactic Anonymisation of Shared Datasets in Resource Constrained EnvironmentsAuthors: Anne V. D. M. Kayem, C. T. Vester, Christoph Meinel |
AbstractResource constrained environments (RCEs) describe remote or rural developing world regions where missing specialised expertise, and computational processing power hinders data analytics operations. Outsourcing to third-party data analytics service providers offers a cost-effective management solution. However, a necessary pre-processing step is to anonymise the data before it is shared, to protect against privacy violations. Syntactic anonymisation algorithms (k-anonymisation, l-diversity, and t-closeness) are an attractive solution for RCEs because the generated data is not use case specific. These algorithms have however been shown to be NP-Hard, and as such need to be re-factored to run efficiently with limited processing power. In previous work [23], we presented a method of extending the standard k-anonymization and l-diversity algorithms, to satisfy both data utility and privacy. We used a multi-objective optimization scheme to minimise information loss and maximize privacy. Our results showed that the extended l-diverse algorithm incurs higher information losses than the extended k-anonymity algorithm, but offers better privacy in terms of protection against inferential disclosure. The additional information loss (7%) was negligible, and did not negatively affect data utility. As a further step, in this paper, we extend this result with a modified t-closeness algorithm based on the notion of clustering. The aim of this is to provide a performance-efficient algorithm that maintains the low information loss levels of our extended k-anonymisation and l-diversity algorithms, but also provides protection against skewness and similarity attacks. Keywords: Data anonymisation, k-anonymity, l-diversity, t-closeness, Clustering, Privacy, Information loss. |