An approach for pre-filtering images from big data sets

An approach for pre-filtering images from big data sets Big Data are now rapidly expanding in various domains as our ability of data generation and collection is growing tremendously with the rapid development of networking and data storage capability. It refers to the practice of collection and processing of complex and massive data sets. Knowledge discovery from such data sets is the major concern and requirement of various organizations. However, analysis of Big Data poses enormous challenges due to its exponential growth factor. In order to address these challenges, MapReduce framework is usually employed for such analysis. This paper proposes a general algorithmic design technique using MapReduce framework for image pre-filtering. The primary focus of our algorithm is to reduce the size of input data set by allowing relevant and non-redundant images based on various constraints for fast and efficient utilization of images in a distributed fashion. This approach achieves significant speedup over conventional techniques and provides much smaller problem instance which can be solved with less resources.