CATALOGO DEI PRODOTTI DELLA RICERCA

We study the problem of optimizing the clustering of a set of vectors when the quality of the clustering is measured by the Entropy or the Gini impurity measure. Our results contribute to the state of the art both in terms of best known approximation guarantees and inapproximability bounds: (i) we give the first polynomial time algorithm for Entropy impurity based clustering with approximation guarantee independent of the number of vectors and (ii) we show that the problem of clustering based on entropy impurity does not admit a PTAS. This also implies an inapproximability result in information theoretic clustering for probability distributions closing a problem left open in [Chaudhury and McGregor, COLT08] and [Ackermann et al., ECCC11]. We also report experiments with a new clustering method that was designed on top of the theoretical tools leading to the above results. These experiments suggest a practical applicability for our method, in particular, when the number of clusters is large.

New results on information theoretic clustering

Ferdinando Cicalese;Eduardo Laber;Lucas Murtinho

2019-01-01

Abstract

We study the problem of optimizing the clustering of a set of vectors when the quality of the clustering is measured by the Entropy or the Gini impurity measure. Our results contribute to the state of the art both in terms of best known approximation guarantees and inapproximability bounds: (i) we give the first polynomial time algorithm for Entropy impurity based clustering with approximation guarantee independent of the number of vectors and (ii) we show that the problem of clustering based on entropy impurity does not admit a PTAS. This also implies an inapproximability result in information theoretic clustering for probability distributions closing a problem left open in [Chaudhury and McGregor, COLT08] and [Ackermann et al., ECCC11]. We also report experiments with a new clustering method that was designed on top of the theoretical tools leading to the above results. These experiments suggest a practical applicability for our method, in particular, when the number of clusters is large.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Codice ISBN degli atti del congresso
	
				978-151088698-8
			
	Parole Chiave
	
				Approximation algorithms, Algorithms, Facility location
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
cicalese19a.pdf solo utenti autorizzati Tipologia: Versione dell'editore Licenza: Accesso ristretto Dimensione 377.31 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	377.31 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1011981

Citazioni

ND

4

2

social impact