Multithreaded implementation of data intensive applications with overlapped I/O and computation

Multithreaded implementation of data intensive applications with overlapped I/O and computation The performance of parallel systems is limited by the I/O performance rather than the CPU performance. Scientific computing and database applications demand for large volume of I/O especially the disk I/O. The widening gap between CPU and disk speeds limits the speedup achieved by parallelizing such I/O intensive applications. Hence reducing I/O overhead in parallel systems deserves special attention. In this paper we introduce a technique that overlaps the disk I/O and computation for an OpenMP based parallel processing environment. The technique uses multiple threads that perform computation in parallel and a single thread for I/O. The overlapping is enabled by using pre-fetch buffers for every computational thread. The above method is tested for a weather data analysis application and the experimental results show that this approach can achieve a speed up of 1.75 over a single threaded implementation. Our system achieves a speed up as high as 8.71 over the Hadoop implementation of the same application.