Article,

A Fine-Grained XML Structural Comparison Approach

, , and .
Conceptual Modeling - ER 2007, (2008)

Abstract

As the Web continues to grow and evolve, more and more information is being placed in structurally rich documents, XML documentsin particular, so as to improve the efficiency of similarity clustering, information retrieval and data management applications.Various algorithms for comparing hierarchically structured data, e.g., XML documents, have been proposed in the literature.Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled asordered labeled trees. Nevertheless, a thorough investigation of current approaches led us to identify several structuralsimilarity aspects, i.e. sub-tree related similarities, which are not sufficiently addressed while comparing XML documents.In this paper, we provide an improved comparison method to deal with fine-grained sub-trees and leaf node repetitions, withoutincreasing overall complexity with respect to current XML comparison methods. Our approach consists of two main algorithmsfor discovering the structural commonality between sub-trees and computing tree-based edit operations costs. A prototype hasbeen developed to evaluate the optimality and performance of our method. Experimental results, on both real and syntheticXML data, demonstrate better performance with respect to alternative XML comparison methods.

Tags

Users

  • @msn

Comments and Reviews