One of the most challenging Pattern Recognition problems in Bioinformatics is to detect if two proteins that show very low sequence similarity are functionally or structurally related–this is the so-called Protein Remote Homology Detection (PRHD) problem. Even if in this context approaches based on the “Bag of Words” (BoW) paradigm showed high potential, there is still room for further refinements, especially by considering the peculiar application context. In this paper we proposed a modified BoW representation for PRHD, which enriches the classic BoW with information derived from the evolutionary history of mutations each protein is subjected to. An experimental comparison on a standard benchmark demonstrates the feasibility of the proposed technique.
Enriched Bag of Words for Protein Remote Homology Detection
LOVATO, PIETRO;BICEGO, Manuele
2016-01-01
Abstract
One of the most challenging Pattern Recognition problems in Bioinformatics is to detect if two proteins that show very low sequence similarity are functionally or structurally related–this is the so-called Protein Remote Homology Detection (PRHD) problem. Even if in this context approaches based on the “Bag of Words” (BoW) paradigm showed high potential, there is still room for further refinements, especially by considering the peculiar application context. In this paper we proposed a modified BoW representation for PRHD, which enriches the classic BoW with information derived from the evolutionary history of mutations each protein is subjected to. An experimental comparison on a standard benchmark demonstrates the feasibility of the proposed technique.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.