Using Copulas in Data Mining Based on the Observational Calculus

Using Copulas in Data Mining Based on the Observational Calculus The objective of the paper is a contribution to data mining within the framework of the observational calculus, through introducing ´generalized quantifiers related to copulas. Fitting copulas to multidimensional data is an increasingly important method for analyzing dependencies, and the proposed quantifiers of observational calculus assess the results of estimating the structure of joint distributions of continuous variables by means of hierarchical Archimedean copulas. To this end, the existing theory of hierarchical Archimedean copulas has been slightly extended in the paper: It has been proven that sufficient conditions for the function defining a hierarchical Archimedean copula to be indeed a copula, which have so far been rigorously established only for the special case of fully nested Archimedean copulas, hold in general. These conditions allow to define three new generalized quantifiers, which are then thoroughly validated on four benchmark data sets and one data set from a real-world application The paper concludes by comparing the proposed quantifiers to a more traditional approach – maximum weight spanning trees.