Paper 3
Global Paradigm for Designing Parallel Relational Data Warehouses in Distributed EnvironmentsAuthors: Soumia Benkrid, Ladjel Bellatreche, Alfredo Cuzzocrea |
AbstractDesigning a Parallel Relational Data Warehouse (PRDW) consists of a set of tasks: (i) choosing the hardware architecture; (ii) fragmenting the data warehouse schema; (iii) allocating the generated fragments; (iv ) replicating fragments in order to ensure high performance; (v) dening the strategies for load balancing and query processing. The major drawback of this life-cycle is the fact that it does not consider the inter-dependency among sub-problems related to the design of PRDW, and it makes use of heterogeneous metrics to evaluate the \quality” of the nal design. In previous research eorts, we introduced an analytical cost model for parallel OLAP query processing in cluster environments. In a second experience, we have taken into account the inter-dependency existing between fragmentation and allocation. In this paper, we propose a novel methodology, called F&A&R, which further extends previous results, and denes an approach where the main PRDW design phases (i.e., fragmentation, allocation, and replication) are performed simultaneously, in a global fashion. In particular, our approach determines whether the fragmentation pattern currently generated is relevant to the allocation process or not. An original method of supporting data replication, based on fuzzy k-means clustering, is also proposed and successfully integrated within the whole design framework. Finally, we experimentally assessed the performance of F&A&R against a well-known data warehouse benchmark, with very promising results. |
|