Request Dependency Graph: A Model for Web Usage Mining in Large-scale Web of Things

Request Dependency Graph: A Model for Web Usage Mining in Large-scale Web of Things In the Web of Things environment, web traffic logs contain valuable information of how people interact with smart devices and web servers. Mining the wealth of information available in the web access logs has theoretical and practical significance for many important applications like network optimization and security management. The first critical step of the mining task is modeling the relationships among HTTP requests for accessing web objects to investigate the behavior of web clients. In this paper, we introduce the request dependency graph, a graph representation of the relationships among HTTP requests. Conceptually, a directed link from A to B in the graph means that the accessing of web object B is caused by the accessing of A, i.e., B depends on A. We propose a methodology to establish such a graph by mining the temporal and causal information among aggregated HTTP requests. To demonstrate the value and effectiveness of the proposed model, we design and implement an algorithm for primary requests identification, which is a critical task of web usage mining, based on the request dependency graph. Evaluation results from a large-scale realworld web access log shows that the request dependency graph is a useful tool for web usage mining.