In this paper we study the poorly investigated problem of learning Random Forests for distance-based Random Forest clustering. We studied both classic schemes as well as alternative approaches, novel in this context. In particular, we investigated the suitability of Gaussian Density Forests [1], Random Forests specifically designed for density estimation. Further, we introduce a novel variant of Random Forest, based on an effective non parametric by-pass estimator of the Renyi entropy, which can be useful when the parametric assumption is too strict. An empirical evaluation involving different datasets and different RF-clustering strategies confirms that the learning step is crucial for RF-clustering. We also present a set of practical guidelines useful to determine the most suitable variant of RF-clustering according to the problem under examination.
On learning Random Forests for Random Forest-clustering
Bicego, M;
2021-01-01
Abstract
In this paper we study the poorly investigated problem of learning Random Forests for distance-based Random Forest clustering. We studied both classic schemes as well as alternative approaches, novel in this context. In particular, we investigated the suitability of Gaussian Density Forests [1], Random Forests specifically designed for density estimation. Further, we introduce a novel variant of Random Forest, based on an effective non parametric by-pass estimator of the Renyi entropy, which can be useful when the parametric assumption is too strict. An empirical evaluation involving different datasets and different RF-clustering strategies confirms that the learning step is crucial for RF-clustering. We also present a set of practical guidelines useful to determine the most suitable variant of RF-clustering according to the problem under examination.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.