Paper 4

Parallel learning of local SVM algorithms for classifying large datasets

Authors: Thanh-Nghi Do and Francois Poulet

Volume 31 (2017)

Abstract

We propose new parallel learning algorithms of local support vector machines (local SVMs) for effectively non-linear classification of large datasets. The algorithms of local SVMs perform the training task of large datasets with two main steps. The first one is to partition the full dataset into k subsets of data, and then the second one is to learn non-linear SVMs from k subsets to locally classify them in parallel way on multi-core computers. The k local SVMs algorithm (kSVM) uses kmeans clustering algorithm to partition the data into k clusters, then constructs in parallel non-linear SVM models to classify data clusters locally. The decision tree with labeling support vector machines (tSVM) uses C4.5 decision tree algorithm to split the full dataset into terminal-nodes, and then it learns in parallel local SVM models for classifying impurity terminal-nodes with mixture of labels. The krSVM algorithm is to train random ensemble of kSVM. The numerical test results on 4 datasets from UCI repository, 3 benchmarks of handwritten letters recognition and a color image collection of one-thousand small objects show that our proposed algorithms of local SVMs (kSVM, tSVM, krSVM) are efficient compared to the standard SVM (LibSVM) in terms of training time and accuracy for dealing with large datasets.