RepFlow: Minimizing flow completion times with replicated flows in data centers

RepFlow: Minimizing flow completion times with replicated flows in data centers Short TCP flows that are critical for many interactive applications in data centers are plagued by long flows and head-of-line blocking in switches. Hash-based load balancing schemes such as ECMP aggravate the matter and result in long-tailed flow completion times (FCT). Previous work on reducing FCT usually requires custom switch hardware and/or protocol changes. We propose RepFlow, a simple yet practically effective approach that replicates each short flow to reduce the completion times, without any change to switches or host kernels. With ECMP the original and replicated flows traverse distinct paths with different congestion levels, thereby reducing the probability of having long queueing delay. We develop a simple analytical model to demonstrate the potential improvement. Further, we conduct NS-3 simulations and Mininet implementation and show that RepFlow provides 50%-70% speedup in both mean and 99-th percentile FCT for all loads, and offers near-optimal FCT when used with DCTCP.