Mining Flexible Association Rules from XML

Caneva, E.; Oliboni, Barbara; Quintarelli, E.

doi:10.1145/1698790.1698805

The role of the eXtensible Markup Language (XML) is be- coming very important in the research elds focusing on the representation, the exchange, and the integration of infor- mation coming from dierent data sources and containing information related to various contexts such as, for exam- ple, medical and biological data. Extracting knowledge from XML datasets is an important issue that may be dicult be- cause of the semistructured intrinsic nature of XML; indeed documents can have an implicit and irregular structure, not dened in advance. In this paper, we propose a novel approach for discovering frequent, but approximate, information in XML documents, based on Flexible Tree Rules taking into account both struc- ture and content of the analyzed data. Our proposal is ex- ible enough to be adapted to both documents with a reg- ular structure and documents with a highly heterogeneous structure, and can be used to evaluate the similarity of XML documents. Moreover, we describe an algorithm to evaluate the similarity degree of a Flexible Tree Rule with respect to an XML document.