Recent advances in DNA microarray technology have made it possible to measure the expression level of several thousand of genes simultaneously. The gene expression profiles obtained from microarray techniques have provided the opportunity of early diagnosis of cancer with the use of supervised learning algorithms. As a simple, effective and nonparametric classification method, k-Nearest Neighbor (k-NN) algorithm has recently been applied for the problem of cancer diagnosis and categorization. An obvious problem of traditional k-NN algorithm is that, when the density of training data is uneven, the precision of classification may reduce due to the consideration of first k nearest neighbors but not the differences of distances. A recent solution for this problem is adopting the theory of fuzzy sets and constructing a new membership function based on the similarities. This study has been conducted to demonstrate in what degree the fuzzification of k-NN algorithm can improve the prediction accuracy of cancer classification based on gene expression data. According to the results of the experiments over a six distinct benchmarking dataset spanning 27 diagnostic categories, it reveals that the fuzzy k-NN algorithm promotes the accuracy of cancer classification to a certain degree. Results also encourage the use of this fuzzification technique on similar problems in computational biology.

A fuzzy k-NN approach for cancer diagnosis with microarray gene expression data

C. Beyan;
2008-01-01

Abstract

Recent advances in DNA microarray technology have made it possible to measure the expression level of several thousand of genes simultaneously. The gene expression profiles obtained from microarray techniques have provided the opportunity of early diagnosis of cancer with the use of supervised learning algorithms. As a simple, effective and nonparametric classification method, k-Nearest Neighbor (k-NN) algorithm has recently been applied for the problem of cancer diagnosis and categorization. An obvious problem of traditional k-NN algorithm is that, when the density of training data is uneven, the precision of classification may reduce due to the consideration of first k nearest neighbors but not the differences of distances. A recent solution for this problem is adopting the theory of fuzzy sets and constructing a new membership function based on the similarities. This study has been conducted to demonstrate in what degree the fuzzification of k-NN algorithm can improve the prediction accuracy of cancer classification based on gene expression data. According to the results of the experiments over a six distinct benchmarking dataset spanning 27 diagnostic categories, it reveals that the fuzzy k-NN algorithm promotes the accuracy of cancer classification to a certain degree. Results also encourage the use of this fuzzification technique on similar problems in computational biology.
2008
fuzzy k-nearest neighbor, machine learning, microarray data anlaysis
File in questo prodotto:
File Dimensione Formato  
IC01_A Fuzzy K-NN Approach For Cancer Diagnosis with Microarray Gene Expression Data.pdf

solo utenti autorizzati

Tipologia: Versione dell'editore
Licenza: Copyright dell'editore
Dimensione 135.32 kB
Formato Adobe PDF
135.32 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1121927
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact