Random forest distances represent a powerful class of data-dependent similarity measures whose usefulness has been shown in many different scenarios. In this paper, we discuss an interesting property of these measures with respect to the curse of dimensionality, i.e., the set of problems that may arise when the feature space is too large with respect to the number of available objects. Starting from a recent theoretical characterization of two RF-distances defined on an ensemble of Extremely Randomized Trees (ERT), we provide some empirical evidence that such distances are indeed robust to the curse of dimensionality, improving their performances when increasing the dimensionality of the space. Further, we empirically show that this behavior is not restricted to the ERT-based RF-distances, but in general, it also holds with alternative training schemes.
An Interesting Property of Random Forest Distances with Respect to the Curse of Dimensionality
Bicego, Manuele;Cicalese, Ferdinando
2024-01-01
Abstract
Random forest distances represent a powerful class of data-dependent similarity measures whose usefulness has been shown in many different scenarios. In this paper, we discuss an interesting property of these measures with respect to the curse of dimensionality, i.e., the set of problems that may arise when the feature space is too large with respect to the number of available objects. Starting from a recent theoretical characterization of two RF-distances defined on an ensemble of Extremely Randomized Trees (ERT), we provide some empirical evidence that such distances are indeed robust to the curse of dimensionality, improving their performances when increasing the dimensionality of the space. Further, we empirically show that this behavior is not restricted to the ERT-based RF-distances, but in general, it also holds with alternative training schemes.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.