Paper 4

Increasing Coverage in Distributed Search and Recommendation with Profile Diversity

Authors: Maximilien Servajean, Esther Pacitti, Miguel Liroz-Gistau, Sihem Amer-Yahia, and Amr El Abbadi

Volume 22 (2015)

Abstract

With the advent of Web 2.0 users are producing bigger and bigger amounts of diverse data, which are stored in a large variety of systems. Since the users’ data spaces are scattered among those independent systems, data sharing becomes a challenging problem. Distributed search and recommendation provides a general solution for data sharing and among its various alternatives, gossip-based approaches are particularly interesting as they provide scalability, dynamicity, autonomy and decentralized control. Generally, in these approaches each participant maintains a cluster of “relevant” users, which are later employed in query processing. However, as we show in the paper, only considering relevance in the construction of the cluster introduces a significant amount of redundancy among users, which in turn leads to reduced recall. Indeed, when a query is submitted, due to the high similarity among the users in a cluster, the probability of retrieving the same set of relevant items increases, thus limiting the amount of distinct results that can be obtained. In this paper, we propose a gossip-based search and recommendation approach that is based on diversity-based clustering scores. We present the resultant new gossip-based clustering algorithms and validate them through experimental evaluation over four real datasets, based on MovieLens-small, MovieLens, LastFM and Delicious. Compared with state of the art solutions, we show that taking into account diversitybased clustering score enables to obtain major gains in terms of recall while reducing the number of users involved during query processing.