Paper 4

Towards A Scalable Semantic Provenance Management System

Authors: Mohamed Amin Sakka and Bruno Defude

Volume 7 (2012)

Abstract

Provenance is a key metadata for assessing electronic doc- uments trustworthiness. It gives an indicator on the reliability and the quality of the document content. Most of the applications exchanging and processing documents on the web or in the cloud become prove- nance aware and provide heterogeneous, decentralized and not interop- erable provenance data. Most of provenance management systems are either dedicated to a speci c application (work ow, database) or a spe- ci c data type. Those systems were not conceived to support provenance over distributed and heterogeneous sources. This implies that end-users are faced with di erent provenance models and di erent query languages. For these reasons, modeling, collecting and querying provenance across heterogeneous distributed sources is still considered as a challenging task. This work presents a new provenance management system (PMS) based on semantic web technologies. It allows to import provenance sources, to enrich them semantically to obtain high level representation of prove- nance. It supports semantic correlation between di erent provenance sources and allows the use of a high level semantic query language. In the context of cloud infrastructure where most of applications will be deployed in a near future, scalability is a major issue for provenance management systems. We described here an implementation of our PMS based on an NoSQL database management system coupled with the map- reduce parallel model and show that it scales linearly depending on the size of the processed logs.