CATALOGO DEI PRODOTTI DELLA RICERCA

Motivation: DNA repeats are a common feature of most genomic sequences. Their de novo identification is still difficult despite being a crucial step in genomic analysis and oligonucleotides design. Several efficient algorithms based on word counting are available, but too short words decrease specificity while long words decrease sensitivity, particularly in degenerated repeats.Results: The Repeat Analysis Program (RAP) is based on a new word-counting algorithm optimized for high resolution repeat identification using gapped words. Many different overlapping gapped words can be counted at the same genomic position, thus producing a better signal than the single ungapped word. This results in better specificity both in terms of low-frequency detection, being able to identify sequences repeated only once, and highly divergent detection, producing a generally high score in most intron sequences.

RAP: a new computer program for de novo identification of repeated sequences in whole genomes

Campagna, Davide;Romualdi, Chiara;Vitulo, Nicola;Del Favero, Micky;Lexa, Matej;Cannata, Nicola;Valle, Giorgio

2005-01-01

Abstract

Motivation: DNA repeats are a common feature of most genomic sequences. Their de novo identification is still difficult despite being a crucial step in genomic analysis and oligonucleotides design. Several efficient algorithms based on word counting are available, but too short words decrease specificity while long words decrease sensitivity, particularly in degenerated repeats.Results: The Repeat Analysis Program (RAP) is based on a new word-counting algorithm optimized for high resolution repeat identification using gapped words. Many different overlapping gapped words can be counted at the same genomic position, thus producing a better signal than the single ungapped word. This results in better specificity both in terms of low-frequency detection, being able to identify sequences repeated only once, and highly divergent detection, producing a generally high score in most intron sequences.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2005
			
	Parole chiave
	
				Algorithms; Animals; Caenorhabditis elegans; Chromosome Mapping; DNA; Pattern Recognition, Automated; Repetitive Sequences, Nucleic Acid; Sequence Alignment; Sequence Analysis, DNA; Software
			
	Appare nelle tipologie:
	
				01.01 Articolo in Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1013410

Citazioni

17

38

35

social impact