Parallel Processing System for Marathi Content Generation

Parallel Processing System for Marathi Content Generation The objective of the present work is to design a HADOOP based parallel Marathi content retrieval system using clustering technique to get the efficient and optimized result than existing systems. The system also focuses on providing the personalized documents in Marathi language to the end user based on their interests identified from the browsing history and using time session mechanism for re ranking of documents to find more interested document from accessed pages. Several authors have presented their work for content retrieval using different categorization techniques such as Naïve Bayes, Neural Networks, Support Vector Machines (SVM), LINGO, Suffix Tree Clustering (STC) and for different languages such as Tamil, Arabic, Polish, and Marathi. Increasing the number of input documents severely affects the processing times and accuracy. The system uses HADOOP based clustering technique to distribute the task of clustering over multiple Datanodes and executing parallel in the HADOOP framework to get the optimized and efficient result of clustering of Marathi documents.