Anomaly detection has the double purpose of discovering interesting exceptions and identifying incorrect data in huge amounts of data. Since anomalies are rare events which violate the frequent relationships among data, we propose a method to detect frequent relationships and then extract anomalies. The RADAR (Research of Anomalous Data through Association Rules) method is based on data mining techniques to extract frequent "rules" from datasets, in the form of quasi-functional dependencies. Such dependencies are extracted by using association rules. Given a quasi-functional dependency, we can discover the associated anomalies by querying either the original database or the association rules previously mined. The analysis on this kind of anomaly can either derive the presence of erroneous data or highlight novel information which represents significant outliers of frequent rules. Our method does not require any previous knowledge and directly infers rules from the data. Experiments performed on real XML databases are reported to show the applicability and effectiveness of the proposed approach.
Anomaly Detection in XML databases by means of Association Rules
QUINTARELLI E;R. ROSSATO
2007-01-01
Abstract
Anomaly detection has the double purpose of discovering interesting exceptions and identifying incorrect data in huge amounts of data. Since anomalies are rare events which violate the frequent relationships among data, we propose a method to detect frequent relationships and then extract anomalies. The RADAR (Research of Anomalous Data through Association Rules) method is based on data mining techniques to extract frequent "rules" from datasets, in the form of quasi-functional dependencies. Such dependencies are extracted by using association rules. Given a quasi-functional dependency, we can discover the associated anomalies by querying either the original database or the association rules previously mined. The analysis on this kind of anomaly can either derive the presence of erroneous data or highlight novel information which represents significant outliers of frequent rules. Our method does not require any previous knowledge and directly infers rules from the data. Experiments performed on real XML databases are reported to show the applicability and effectiveness of the proposed approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.