Towards Energy Efficiency in Heterogeneous Hadoop Clusters by Adaptive Task Assignment

Towards Energy Efficiency in Heterogeneous Hadoop Clusters by Adaptive Task Assignment The cost of powering servers, storage platforms and related cooling systems has become a major component of the operational costs in big data deployments. Hence, the design of energy-efficientHadoop clusters has attracted significant research attentions in recent years. However, existing studies do not consider the impact of the complex interplay between workload and hardware heterogeneity on energy efficiency. In this paper, we find that heterogeneity-oblivious task assignment approaches are detrimental to both performance and energy efficiency of Hadoop clusters. Importantly, we make a counterintuitive observation that even heterogeneity-aware techniques that focus on reducing job completion time do not necessarily guarantee energy efficiency. We propose a heterogeneity-aware task assignment approach, E-Ant, that aims to minimize the overall energy consumption in a heterogeneous Hadoop cluster without sacrificing job performance. It adaptively schedules heterogeneous workloads on energy-efficient machines, without a priori knowledge of the workload properties. Furthermore, it provides the flexibility to trade off energy efficiency and job fairness in a Hadoop cluster. E-Ant employs an ant colony optimization approach that generates task assignment solutions based on the feedback of each task’s energy consumption reported by HadoopTask Trackers in an agile way. Experimental results on a heterogeneous cluster with varying hardware capabilities show that E-Ant improves the overall energy savings for a synthetic workload from Microsoft by 17% and 12% compared to Fair Scheduler and Tarazu, respectively.