On the Good Behaviour of Extremely Randomized Trees in Random Forest-Distance Computation

Bicego, Manuele; Cicalese, Ferdinando

doi:10.1007/978-3-031-43421-1_38

Originally introduced in the context of supervised classification, ensembles of Extremely Randomized Trees (ERT) have shown to provide surprisingly effective models also in unsupervised settings, e.g., for anomaly detection (via Isolation Forests) and for distance computation. In this paper, we focus on this latter application of ERT, namely in the context of Random Forest (RF) distance computation. We aim at narrowing the gap between the established empirical evidence of the good behaviour of ERT and the still limited theoretical understanding of their (somehow) surprisingly good performance when compared to more involved methodologies. Our main contribution is the following: we assume the existence of a proper representation on a given domain, i.e., a vectorial representation of the objects which satisfies the Compactness Hypothesis formulated by Arkadev and Braverman in 1967. Under such a hypothesis, given the "true" distance between two objects, we show how to derive a bound on the approximation guaranteed by two main RF-distances obtained by employing ensembles of ERTs, with respect to such "true" distance. In other words, we show that there exists a constant c such that if two objects are epsilon-close in the true distance, then with high probability they are (c center dot epsilon)-close in the RF-distances computed with ERT forests.

On the Good Behaviour of Extremely Randomized Trees in Random Forest-Distance Computation

Bicego, Manuele;Cicalese, Ferdinando

2023-01-01

Abstract

Originally introduced in the context of supervised classification, ensembles of Extremely Randomized Trees (ERT) have shown to provide surprisingly effective models also in unsupervised settings, e.g., for anomaly detection (via Isolation Forests) and for distance computation. In this paper, we focus on this latter application of ERT, namely in the context of Random Forest (RF) distance computation. We aim at narrowing the gap between the established empirical evidence of the good behaviour of ERT and the still limited theoretical understanding of their (somehow) surprisingly good performance when compared to more involved methodologies. Our main contribution is the following: we assume the existence of a proper representation on a given domain, i.e., a vectorial representation of the objects which satisfies the Compactness Hypothesis formulated by Arkadev and Braverman in 1967. Under such a hypothesis, given the "true" distance between two objects, we show how to derive a bound on the approximation guaranteed by two main RF-distances obtained by employing ensembles of ERTs, with respect to such "true" distance. In other words, we show that there exists a constant c such that if two objects are epsilon-close in the true distance, then with high probability they are (c center dot epsilon)-close in the RF-distances computed with ERT forests.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Codice ISBN degli atti del congresso
	
				9783031434204
			
	Parole Chiave
	
				Random Forest-distances
Extremely Randomized Trees
Compactness hyphotesis
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1125084

Citazioni

ND

3

2

CATALOGO DEI PRODOTTI DELLA RICERCA

On the Good Behaviour of Extremely Randomized Trees in Random Forest-Distance Computation

Bicego, Manuele;Cicalese, Ferdinando

2023-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

CATALOGO DEI PRODOTTI DELLA RICERCA

On the Good Behaviour of Extremely Randomized Trees in Random Forest-Distance Computation

Bicego, Manuele;Cicalese, Ferdinando

2023-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)