Recent advances in DNA microarray technology have made it possible to measure the expression level of several thousand of genes simultaneously. The gene expression profiles obtained from microarray techniques have provided the opportunity of early diagnosis of cancer with the use of supervised learning algorithms. As a simple, effective and nonparametric classification method, k-Nearest Neighbor (k-NN) algorithm has recently been applied for the problem of cancer diagnosis and categorization. An obvious problem of traditional k-NN algorithm is that, when the density of training data is uneven, the precision of classification may reduce due to the consideration of first k nearest neighbors but not the differences of distances. A recent solution for this problem is adopting the theory of fuzzy sets and constructing a new membership function based on the similarities. This study has been conducted to demonstrate in what degree the fuzzification of k-NN algorithm can improve the prediction accuracy of cancer classification based on gene expression data. According to the results of the experiments over a six distinct benchmarking dataset spanning 27 diagnostic categories, it reveals that the fuzzy k-NN algorithm promotes the accuracy of cancer classification to a certain degree. Results also encourage the use of this fuzzification technique on similar problems in computational biology.
A fuzzy k-NN approach for cancer diagnosis with microarray gene expression data
C. Beyan;
2008-01-01
Abstract
Recent advances in DNA microarray technology have made it possible to measure the expression level of several thousand of genes simultaneously. The gene expression profiles obtained from microarray techniques have provided the opportunity of early diagnosis of cancer with the use of supervised learning algorithms. As a simple, effective and nonparametric classification method, k-Nearest Neighbor (k-NN) algorithm has recently been applied for the problem of cancer diagnosis and categorization. An obvious problem of traditional k-NN algorithm is that, when the density of training data is uneven, the precision of classification may reduce due to the consideration of first k nearest neighbors but not the differences of distances. A recent solution for this problem is adopting the theory of fuzzy sets and constructing a new membership function based on the similarities. This study has been conducted to demonstrate in what degree the fuzzification of k-NN algorithm can improve the prediction accuracy of cancer classification based on gene expression data. According to the results of the experiments over a six distinct benchmarking dataset spanning 27 diagnostic categories, it reveals that the fuzzy k-NN algorithm promotes the accuracy of cancer classification to a certain degree. Results also encourage the use of this fuzzification technique on similar problems in computational biology.File | Dimensione | Formato | |
---|---|---|---|
IC01_A Fuzzy K-NN Approach For Cancer Diagnosis with Microarray Gene Expression Data.pdf
solo utenti autorizzati
Tipologia:
Versione dell'editore
Licenza:
Copyright dell'editore
Dimensione
135.32 kB
Formato
Adobe PDF
|
135.32 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.