Paper 2

Top-k Queries over Distributed Uncertain Categorical Data

Authors: Adel Benaissa, Soror Sahri, Mourad Ouziri

Volume 43 (2020)

Abstract

Uncertain data arises in many modern applications including sensor networks, data integration, and information extraction. Often this data is distributed and there is a need to do efficient query processing over the data in situ. We focus on answering top-k queries and propose a distributed algorithm TDUD, to efficiently answer top-k queries over distributed uncertain categorical data in queries single round of communication. TDUD uses a distributed index structure composed of local uncertain indexes (LUIs) on local sites and a single global uncertain index (GUI) on a coordinator site. Our algorithm minimizes the amount of communication needed to answer a top-k query by maintaining the mean sum dispersion of the probability distribution on each site. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed methods in terms of communication costs and response time. We show empirically that TDUD is near-optimal in that it can typically retrieve the top-k query answers by communicating only k tuples in a single round.

Keywords: Distributed databases, Distributed indexing, Uncertain data, Top-k query.