Clustering is a widely used unsupervised data mining technique. It allows to identify structures in collections of objects by grouping them into classes, named clusters, in such a way that similarity of objects within any cluster is maximized and similarity of objects belonging to different clusters is minimized. In density-based clustering, a cluster is defined as a connected dense component and grows in the direction set by the density. The basic structure of density-based clustering presents some common drawbacks: (i) parameters have to be set; (ii) the behavior of the algorithm is sensitive to the density of the starting point; and (iii) adjacent clusters of different densities could be not properly identified. In this paper, we address all the above problems. Our method, based on the concept of space stratification, efficiently identifies the different densities in the dataset and, accordingly, ranks the objects of the original space. Next, it exploits such a knowledge by projecting the original data into a higher-dimensional space and performs a density based clustering taking into account the reverse-nearest-neighbor of the objects. Our method also reduces the number of input parameters by giving a guideline to set them in a suitable way. Experimental results indicate that our algorithm is able to deal with clusters of different densities and outperforms the most popular algorithms DBSCAN and OPTICS in all the standard benchmark datasets.

Enhancing density-basedclustering:Parameterreduction and outlierdetection

Giugno Rosalba;
2013-01-01

Abstract

Clustering is a widely used unsupervised data mining technique. It allows to identify structures in collections of objects by grouping them into classes, named clusters, in such a way that similarity of objects within any cluster is maximized and similarity of objects belonging to different clusters is minimized. In density-based clustering, a cluster is defined as a connected dense component and grows in the direction set by the density. The basic structure of density-based clustering presents some common drawbacks: (i) parameters have to be set; (ii) the behavior of the algorithm is sensitive to the density of the starting point; and (iii) adjacent clusters of different densities could be not properly identified. In this paper, we address all the above problems. Our method, based on the concept of space stratification, efficiently identifies the different densities in the dataset and, accordingly, ranks the objects of the original space. Next, it exploits such a knowledge by projecting the original data into a higher-dimensional space and performs a density based clustering taking into account the reverse-nearest-neighbor of the objects. Our method also reduces the number of input parameters by giving a guideline to set them in a suitable way. Experimental results indicate that our algorithm is able to deal with clusters of different densities and outperforms the most popular algorithms DBSCAN and OPTICS in all the standard benchmark datasets.
2013
clustering; ottimizzazione; high domensionality
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/940451
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 107
  • ???jsp.display-item.citation.isi??? 78
social impact