Building block components to control a data rate in the Apache Hadoop compute platform

Building block components to control a data rate in the Apache Hadoop compute platform Resource management is one of the most indispens- able components of cluster-level infrastructure layers. Users of such systems should be able to specify their job requirements as a configuration parameter (CPU, memory, disk I/O, network I/O) that are translated into an appropriate resource reservation and resource allocation decision by the resource management function. YARN is an emerging resource management framework in the Hadoop ecosystem, which supports only memory and CPU reservation at present. In this paper, we propose a solution that takes into account the operation of the Hadoop Distributed File System to control the data rate of applications in the framework of a Hadoop compute platform. We utilize the property that a data pipe between a container and a DataNode consists of a disk I/O subpipe and a TCP/IP subpipe. We have implemented building block software components to control the data rate of data pipes between containers and DataNodes and provide a proof-of-concept with measurement results.