Paper 4

Improving Clustering-based Schema Matching using Latent Semantic Indexing

Authors: Alsayed Algergawy, Seham Moawed, Amany Sarhan, Ali Eldosouky, and Gunter Saake4

Volume 15 (2014)

Abstract

The increasing size and the widespread use of XML data and di erent types of ontologies result in the big challenge of how to integrate these data. A critical step towards building this integration is to identify and discover semantically corresponding elements across heterogeneous data sets. This identi cation process becomes more and more challenging when dealing with large schemas and ontologies. Clustering-based matching is a great step towards more signi cant reduction of the search space and thus improving the matching eciency. However, current methods used to identify similar clusters depend on literally matching terms. To keep high matching quality along with high matching eciency, hidden semantic relationships among clusters’ elements should be discovered. To this end, in this paper, we propose a Latent Semantic Indexing-based approach that allows retrieving the conceptual meaning between clusters. The experimental evaluations reveal that the proposed approach permits encouraging and signi cant improvements towards building large-scale matching approaches.