We present a method for evaluating the suitability of different string dissimilarity measures and clustering algo- rithms for EST clustering, one of the main techniques used in transcriptome projects. The method comprises gener- ating simulated ESTs with user-specified parameters, and then evaluating the quality of clusterings produced when different dissimilarity measures and different clustering al- gorithms are used. We implemented two tools to do this: ESTSim (EST Simulator), which generates simulated EST sequences from mRNAs/cDNAs using user-specified param- eters, and ECLEST (Evaluator for CLusterings of ESTs), which computes and evaluates a clustering of a set of in- put ESTs, where the dissimilarity measure, the clustering algorithm, and the clustering validity index can be speci- fied independently. We demonstrate the method on a sample of 699 cDNAs, generating approximately 16,000 simulated ESTs. We conducted two experiments and derived statisti- cally significant results from this study comparing subword- based dissimilarity measures to alignment-based ones.
A Method for Evaluating the Quality of String Dissimilarity Measures and Clustering Algorithms for EST Clustering
Liptak, Zsuzsanna
2004-01-01
Abstract
We present a method for evaluating the suitability of different string dissimilarity measures and clustering algo- rithms for EST clustering, one of the main techniques used in transcriptome projects. The method comprises gener- ating simulated ESTs with user-specified parameters, and then evaluating the quality of clusterings produced when different dissimilarity measures and different clustering al- gorithms are used. We implemented two tools to do this: ESTSim (EST Simulator), which generates simulated EST sequences from mRNAs/cDNAs using user-specified param- eters, and ECLEST (Evaluator for CLusterings of ESTs), which computes and evaluates a clustering of a set of in- put ESTs, where the dissimilarity measure, the clustering algorithm, and the clustering validity index can be speci- fied independently. We demonstrate the method on a sample of 699 cDNAs, generating approximately 16,000 simulated ESTs. We conducted two experiments and derived statisti- cally significant results from this study comparing subword- based dissimilarity measures to alignment-based ones.File | Dimensione | Formato | |
---|---|---|---|
IEEE_BIBE04.pdf
solo utenti autorizzati
Tipologia:
Versione dell'editore
Licenza:
Copyright dell'editore
Dimensione
210.5 kB
Formato
Adobe PDF
|
210.5 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.