Counting occurrences of textual words in lecture video frames using Apache Hadoop Framework

Counting occurrences of textual words in lecture video frames using Apache Hadoop Framework In recent years, on-line lecture videos are becoming significant pedagogical tool for both course instructors and students. Text present in lecture video will act as an important modality for retrieving videos as it is closely related to its content. In this paper, we present a distributed system for counting occurrences of each textual word from video frames using Apache Hadoop framework. As Hadoopframework is suitable for batch processing operations and, the processing of images is highly concurrent, we can implement batch processing operation of reading text information and counting the occurrence of each word by using MapReduce framework. We tested the working of text recognition and word count algorithms on Hadoop framework of cluster size 1, 5 and 10 nodes. Also we compared the performances of multimode clusters with a single node machine. On a data set of size around 3GB lecture video frames, Hadoop with a cluster size of 10 nodes executes 5 times faster than a single node system. Our results prove the advantage of using Hadoop for improving computational speed of processing image and video processing applications.