Isolation Forests (IForest), a specific variant of Random Forests tailored for anomaly detection, operate by isolating points through recursive partitioning. Despite their widespread use and enhancements in splitting rules, training schemes, and anomaly scoring, an often overlooked aspect is their stability due to the inherent randomness. Surprisingly, most studies and empirical evaluations report results based on a single execution or on the average of a few executions, potentially overlooking significant variability due to this randomness. This paper presents a detailed investigation of the stability of IForests’ outcome, proposing some empirical evidence that there may be substantial differences in results across different runs. By exploiting concepts from the field of Ensemble Classifiers, we propose a possible explanation and a strategy to mitigate this instability. Even if we limit our examination to the original IForest model using standard parameters and datasets from the foundational papers, our study underscores the importance of accounting for the random nature of IForests and offers insights and recommendations for practitioners.
An Empirical Characterization of the Stability of Isolation Forest Results
Azzari, Alberto;Bicego, Manuele
2024-01-01
Abstract
Isolation Forests (IForest), a specific variant of Random Forests tailored for anomaly detection, operate by isolating points through recursive partitioning. Despite their widespread use and enhancements in splitting rules, training schemes, and anomaly scoring, an often overlooked aspect is their stability due to the inherent randomness. Surprisingly, most studies and empirical evaluations report results based on a single execution or on the average of a few executions, potentially overlooking significant variability due to this randomness. This paper presents a detailed investigation of the stability of IForests’ outcome, proposing some empirical evidence that there may be substantial differences in results across different runs. By exploiting concepts from the field of Ensemble Classifiers, we propose a possible explanation and a strategy to mitigate this instability. Even if we limit our examination to the original IForest model using standard parameters and datasets from the foundational papers, our study underscores the importance of accounting for the random nature of IForests and offers insights and recommendations for practitioners.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.