The role of the eXtensible Markup Language (XML) is be- coming very important in the research elds focusing on the representation, the exchange, and the integration of infor- mation coming from dierent data sources and containing information related to various contexts such as, for exam- ple, medical and biological data. Extracting knowledge from XML datasets is an important issue that may be dicult be- cause of the semistructured intrinsic nature of XML; indeed documents can have an implicit and irregular structure, not dened in advance. In this paper, we propose a novel approach for discovering frequent, but approximate, information in XML documents, based on Flexible Tree Rules taking into account both struc- ture and content of the analyzed data. Our proposal is ex- ible enough to be adapted to both documents with a reg- ular structure and documents with a highly heterogeneous structure, and can be used to evaluate the similarity of XML documents. Moreover, we describe an algorithm to evaluate the similarity degree of a Flexible Tree Rule with respect to an XML document.

Mining Flexible Association Rules from XML

OLIBONI, Barbara;E. Quintarelli
2009-01-01

Abstract

The role of the eXtensible Markup Language (XML) is be- coming very important in the research elds focusing on the representation, the exchange, and the integration of infor- mation coming from dierent data sources and containing information related to various contexts such as, for exam- ple, medical and biological data. Extracting knowledge from XML datasets is an important issue that may be dicult be- cause of the semistructured intrinsic nature of XML; indeed documents can have an implicit and irregular structure, not dened in advance. In this paper, we propose a novel approach for discovering frequent, but approximate, information in XML documents, based on Flexible Tree Rules taking into account both struc- ture and content of the analyzed data. Our proposal is ex- ible enough to be adapted to both documents with a reg- ular structure and documents with a highly heterogeneous structure, and can be used to evaluate the similarity of XML documents. Moreover, we describe an algorithm to evaluate the similarity degree of a Flexible Tree Rule with respect to an XML document.
2009
9781605586502
Association rules; XML; Data Mining
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/342335
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact