Paper 4

A Parallel Incremental Frequent Itemsets Mining IFIN+: Improvement and Extensive Evaluation

Authors: Van Quoc Phuong Huynh, Josef Küng, Tran Khanh Dang

Volume 41 (2019)

Abstract

In this paper, we propose a shared-memory parallelization solution for the Frequent Itemsets Mining algorithm IFIN, called IFIN+. The motivation for our work is that commodity processors, nowadays, are enhanced with many physical computational units, and exploiting full advantage of this is a potential solution to improve computational performance in single-machine environments. The portions in the serial version are improved in means which increases efficiency and computational independence for convenience in designing parallel computation with Work-Pool model, be known as a good model for load balance. We conducted extensive experiments on both synthetic and real datasets to evaluate IFIN+ against its serial version IFIN, the well-known algorithm FP-Growth and other two state-of-the-art ones, FIN and PrePost+. The experimental results show that the running time of IFIN+ is the most efficient, especially in the case of mining at different support thresholds within the same running session. Compare to its serial version, IFIN+ performance is improved significantly.

Keywords: Incremental, Parallel, Frequent Itemsets Mining, Data mining, Big Data, IPPC-Tree, IFIN, IFIN+.