Job aware scheduling in Hadoop for heterogeneous cluster

Job aware scheduling in Hadoop for heterogeneous cluster Hadoop cluster is specifically designed to store and analyze a large amount of data in distributed environment. With ever increasing use of Hadoop clusters, a scheduling algorithm is required for optimal utilisation of cluster resources. The existing scheduling algorithms are limited to one or more of the following crucial problems such as limited utilization of computing resources, limited applicability towards heterogeneous cluster, random scheduling of non-local map tasks, and negligence of small jobs in scheduling. In this paper, we propose a novel job aware scheduling algorithm that overcomes the above limitations. In addition, we analyze the performance of the proposed algorithm using MapReduce WordCount benchmark. The experimental results show that the proposed algorithm increases the resource utilization and reduces the average waiting time compared to existing Matchmaking scheduling algorithm.