Efficient Prototyping of Fault Tolerant Map-Reduce Applications with Docker-Hadoop

Efficient Prototyping of Fault Tolerant Map-Reduce Applications with Docker-Hadoop Prototyping and testing distributed systems is considered to be a hard task because it is not always possible to reproduce a given sequence of events. While simulations may help on this task, they cannot replace test and validation with real systems. In this paper we present Docker-Hadoop, a container-based virtualization platform designed to prototype, test and deploy MapReduce applications and systems. This tool allowed us to test and reproduce fault-tolerance scenarios that are especially interesting in the context of the PER-MARE project, which aims at adapting the Hadoop framework to the case pervasive systems. Indeed, we developed a fault-tolerant component that can circumvent the limitations from original Hadoop and prevent the job scheduling stall in the case of failures or network disconnections. Thanks to Docker-Hadoop, we could easily prototype and test our improved Hadoop, with the first scalability and speedup results being presented in this paper.