Paper 6

Facilitating and Managing Machine Learning and Data Analysis Tasks in Big Data Environments Using Web and Microservice Technologies

Authors: Shadi Shahoud, Sonja Gunnarsdottir, Hatem Khalloof, Clemens Duepmeier, Veit Hagenmeyer

Volume 45 (2020)

Abstract

Driven by the current advances of machine learning in a wide range of application areas, the need for developing easy to use frameworks for instrumenting machine learning effectively for non data analytics experts as well as novices increased dramatically. Furthermore, building machine learning models in the context of Big Data environments still represents a great challenge. In the present article, those challenges are addressed by introducing a new generic framework for efficiently facilitating the training, testing, managing, storing and retrieving of machine learning models in the context of Big Data. The framework makes use of a powerful Big Data software stack platform, web technologies and a microservice architecture for a fully manageable and highly scalable solution. A highly configurable user interface hiding platform details from the user is introduced giving the user the ability to easily train, test and manage machine learning models. Moreover, the framework automatically indexes and characterizes models and allows flexible exploration of them in the visual interface. The performance and usability of the new framework is evaluated on state-of-the-arts machine learning algorithms: it is shown that executing, storing and retrieving machine learning models via the framework results in a well acceptable low overhead demonstrating that the framework can provide an efficient approach for facilitating machine learning in Big Data environments. It is also evaluated, how configuration options (e.g. caching of RDDs in Apache Spark) affect runtime performance. Furthermore, the evaluation provides indicators for when the utilization of distributed computing (i.e. parallel computation) based on Apache Spark on a cluster outperforms single computer execution of a machine learning model.

Keywords

Microservice, Web-based applications, Big Data, Data analytic, Machine Learning