The performance evaluation of k-means by two MapReduce frameworks, Hadoop vs. Twister

The performance evaluation of k-means by two MapReduce frameworks, Hadoop vs. Twister In data mining, k-means is a method of cluster analysis using the nearest mean. It has been successfully used in various topics, ranging from market segmentation, computer vision, geostatistics, and astronomy to agriculture. But k-means like clustering is not easy to apply MapReduce model due to the iterative manner that can happen the stagger map tasks with high likelihood. This paper presents the result of performance evaluation of K-means application running on Twister and Hadoop framework. We report how to design a MapReduce application to organize the objects of dataset into k partitions. This approach provides the way to cluster a dataset by Hadoop, the MapReduce frameworks in a parallel manner.