Extending Hadoop’s Yarn Scheduler Load Simulator with a highly realistic network & traffic model

Extending Hadoop’s Yarn Scheduler Load Simulator with a highly realistic network & traffic model Research on accelerating big-data applications can be divided into job scheduling and flow scheduling. Job scheduling focuses on the timely and spacial placement of jobs on execution units. Flow scheduling, on the other hand, concentrates on routing of flows originating from actively running jobs. Although both job scheduling and flow scheduling work on accelerating big-data applications, their view on the problem and the available information is very different. We propose a new simulation tool to evaluate ideas that jointly solve the job and flow scheduling problem for big-data applications. Our tool combines the Yarn Scheduler Load Simulator with the distributed network emulator MaxiNet. With our work, the interdependency between the network and the jobs running on top of it can be included into the evaluation of new ideas, leveraging research on big-data applications with joint job and flow scheduling.