Paper 4 – TLDKS Journal

A Self-Adaptive and Incremental Approach for Data Profiling in the Semantic Web

Authors: Kenza Kellou-Menouer and Zoubida Kedad

Volume 29 (2016)

Abstract

The increasing adoption of linked data principles has led to the availability of a huge amount of datasets on the Web. However, the use of these datasets is hindered by the lack of descriptive information about their content. Indeed, interlinking, matching or querying them requires some knowledge about the types and properties they contain.

In this paper, we tackle the problem of describing the content of an RDF dataset by profiling its entities, which consists in discovering the implicit types and providing their description. Each type is described by a profile composed of properties and their probabilities. Our approach relies on a clustering algorithm. It is self-adaptive, as it can automatically detect the most appropriate similarity threshold according to the dataset. Our algorithms generate overlapping clusters, enabling the detection of several types for an entity. As a dataset may evolve, our approach is incremental and can assign a type to a new entity and update the type profiles without browsing the whole dataset. We also present some experimental evaluations to demonstrate the effectiveness of our approach.