BOLAS: Bipartite-Graph Oriented Locality-Aware Scheduling for MapReduce Tasks

BOLAS: Bipartite-Graph Oriented Locality-Aware Scheduling for MapReduce Tasks Task scheduling is critical to reduce the make span of MapReduce jobs. It is an effective approach for scheduling optimization by improving the data locality, which involves attempting to locate a task and its related data block on the same node. However, recent studies have been insufficient in addressing the locality issue. This paper proposes BOLAS, a MapReducetask scheduling algorithm, which models the scheduling processes a bipartite-graph matching problem trying best to assign data block to the nearest task. By considering the divergence of node performance of distribution of data blocks in MapReduce cluster, BOLAS can achieve a high degree of data locality, guarantee minimal data transfer during execution, and reduces a job’s makespan subsequently. As a dynamic algorithm, BOLAS solves the model using Kuhn-Munkres optimal matching algorithm, and can be deployed in either homogeneous or heterogeneous environments. In this study, BOLAS was implemented as a plug in for Hadoop, and the experimental results indicate that BOLAScan localize nearly 100% of the map tasks and reduce the execution time by up to 67.1%.