Design and Implementation of the Hadoop-Based Crawler for SaaS Service Discovery

Design and Implementation of the Hadoop-Based Crawler for SaaS Service Discovery Software as a Service is the most adopted cloud service (46%) compared with Infrastructure as a Service (IaaS) (35%) and Platform as a Service (PaaS) (34%) [1]. Currently, the capability of discovering a SaaS of interest online across multiple cloud providers and reviews websites is a significant challenge, especially when using general search mechanisms (Google and Yahoo!) and search tools provided by existing reviews and directories. Discovering a SaaS is time-consuming, requiring consumers to browse several websites to select the appropriate service. This paper addresses the issues related to the efficient discovery of SaaS across review websites by developing the SaaS Nutch Hadoop-based Crawler Engine – SaaS Nhbased Crawler. The crawler is capable of crawling cloud reviews to find SaaSs of interest and enable the establishment of a central repository that could be used to discover SaaSs much more efficiently. The results show that the SaaS Nhbased crawler can effectively crawl review websites and provide a list of the latest SaaS being offered.