Paper 1

Sensitivity – An Important Facet of Cluster Validation Process for Entity Matching Technique

Authors: Sumit Mishra, Samrat Mondal and Sriparna Saha

Volume 29 (2016)

Abstract

Cluster validity measure is one of the important components of cluster validation process in which once a clustering arrangement is found, then it is compared with the actual clustering arrangement or gold standard if it is available. For this purpose, different external cluster validity measures (VMs) are available. However, all the measures are not equally good for some specific clustering problem. For example, in entity matching technique, F-measure is a preferably used VM than McNemar index as the former satisfies a given set of desirable properties for entity matching problem. But we have observed that even if all the existing desirable properties are satisfied, then also some of the important differences between two clustering arrangements are not detected by some VMs. Thus we propose to introduce another property, termed as sensitivity, which can be added to the desirable property set and can be used along with the existing set of properties for the cluster validation process. In this paper, the sensitivity property of a VM is formally introduced and then the value of sensitivity is computed using the proposed identity matrix based technique. A comprehensive analysis is made to compare some of the existing VMs and then the suitability of the VMs with respect to the entity matching technique is obtained. Thus, this paper helps to improve the performance of the cluster validation process.