In this paper we present a novel Random Forest Clustering approach, called Dissimilarity Random Forest Clustering (DisRFC), which requires in input only pairwise dissimilarities. Thanks to this characteristic, the proposed approach is appliable to all those problems which involve non-vectorial representations, such as strings, sequences, graphs or 3D structures. In the proposed approach, we first train an Unsupervised Dis-similarity Random Forest (UD-RF), a novel variant of Random Forest which is completely unsupervised and based on dissimilarities. Then, we exploit the trained UD-RF to project the patterns to be clustered in a binary vectorial space, where the clustering is finally derived using fast and effective K-means procedures. In the paper we introduce different variants of DisRFC, which are thoroughly and positively evaluated on 12 different problems, also in comparison with alternative state-of-the-art approaches.(c) 2022 Elsevier Ltd. All rights reserved.

DisRFC: a dissimilarity-based Random Forest Clustering approach

Bicego, M
2023-01-01

Abstract

In this paper we present a novel Random Forest Clustering approach, called Dissimilarity Random Forest Clustering (DisRFC), which requires in input only pairwise dissimilarities. Thanks to this characteristic, the proposed approach is appliable to all those problems which involve non-vectorial representations, such as strings, sequences, graphs or 3D structures. In the proposed approach, we first train an Unsupervised Dis-similarity Random Forest (UD-RF), a novel variant of Random Forest which is completely unsupervised and based on dissimilarities. Then, we exploit the trained UD-RF to project the patterns to be clustered in a binary vectorial space, where the clustering is finally derived using fast and effective K-means procedures. In the paper we introduce different variants of DisRFC, which are thoroughly and positively evaluated on 12 different problems, also in comparison with alternative state-of-the-art approaches.(c) 2022 Elsevier Ltd. All rights reserved.
2023
Random forests clustering
Dissimilarities
Unsupervised learning
Clustering
Non -vectorial representation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1086926
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact