Paper 4

Slicing the Dimensionality: Top-k Query Processing for High-dimensional Spaces

Authors: Gheorghi Guzun, Joel Tosado, and Guadalupe Canahuate

Volume 14 (2014)

Abstract

Top-k (preference) queries are used in several domains to retrieve the set of k tuples that more closely match a given query. For high-dimensional spaces, evaluation of top-k queries is expensive, as data and space partitioning indices perform worse than sequential scan. An al- ternative approach is the use of sorted lists to speed up query evaluation. This approach extends performance gains when compared to sequential scan to about ten dimensions. However, data-sets for which preference queries are considered, often are high-dimensional. In this paper, we ex- plore the the use of bit-sliced indices (BSI) to encode the attributes or score lists and perform top-k queries over high-dimensional data using bit-wise operations. Our approach does not require sorting or random ac- cess to the index. Additionally, bit-sliced indices require less space than other type of indices. The size of the bit-sliced index (without using com- pression) for a normalized data-set with 3 decimals is 60 times smaller than the size of sorted lists. Furthermore, our experimental evaluation shows that the use of BSI for top-k query processing is more ecient than Sequential Scan for high-dimensional data. When compared to Se- quential Top-k Algorithm (STA), BSI is one order of magnitude faster.