Counterintuitive Behavior of Clustering Quality: Findings for K-Means on Synthetic and Real Data

Loog, Marco; Krijthe, Jesse H.; Bicego, Manuele

doi:10.1007/978-3-031-91398-3_12

Little is known about how the quality of a clustering changes when changing the size of the set used to determine the clustering model. We show that, for K-means clustering, the relationship between dataset size and clustering quality can display counterintuitive behavior. Notably, the quality can significantly deteriorate with more data to build the model. More generally, using artificial datasets and data from bioinformatics, we uncover a variety of learning curve behaviors for K-means. Our results clearly illustrate that the training sample size can have a nontrivial influence on the clustering performance. Our findings should appeal to both the clustering practitioner and the clustering researcher concerned with developing basic insights.

Counterintuitive Behavior of Clustering Quality: Findings for K-Means on Synthetic and Real Data

Loog, Marco;Krijthe, Jesse H.;Bicego, Manuele

2025-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN degli atti del congresso
	
				9783031913976
			
	Parole Chiave
	
				Clustering quality     K-means     Counterintuitive behavior     Monotonicity     Gene ontology enrichment analysis
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1161709

CATALOGO DEI PRODOTTI DELLA RICERCA

Counterintuitive Behavior of Clustering Quality: Findings for K-Means on Synthetic and Real Data

Loog, Marco;Krijthe, Jesse H.;Bicego, Manuele

2025-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

CATALOGO DEI PRODOTTI DELLA RICERCA

Counterintuitive Behavior of Clustering Quality: Findings for K-Means on Synthetic and Real Data

Loog, Marco;Krijthe, Jesse H.;Bicego, Manuele

2025-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)