Paper 6

Mining Multiple Related Data Sources Using Object-oriented Model

Authors: C.I. Ezeife and Dan Zhang

Volume 13 (2014)

Abstract

An object-oriented database is represented by a set of classes connected by their class inheritance hierarchy through superclass and subclass relationships. An object-oriented database is suitable for cap- turing more comprehensive and detailed complexity of real world data such as capturing multiple related tables representing data schemas of a retail store web site, or capturing multiple databases such as several retail store web sites. Modeling web and other data as a number of ob- ject database schemas would enable derived, historical, and comparative mining of multiple databases and tables. This paper proposes an object-oriented class model and database schema, and a series of class methods including that for object-oriented join (OO- Join) for mining multiple data sources through object oriented model. The OOJoin procedure joins superclass and subclass tables by match- ing their type and super type relationships. Mining Hierarchical Fre- quent Patterns (MineHFPs) from multiple integrated databases is done by applying an extended TidFP technique which specifies the object class hierarchy by traversing the multiple database inheritance hierarchy. This paper also extends map-gen join method used in TidFP algorithm to oomap-gen join for generating k-itemset object candidate patterns. The oomap-gen join reduces the number of candidate itemsets gener- ated through indexing of the (k-1)-itemset candidate pattern with start and end position codes for the inheritance hierarchy level. Experimental results show that the proposed MineHFPs algorithm for mining hierar- chical frequent patterns is effective and efficient for complex queries.