CATALOGO DEI PRODOTTI DELLA RICERCA

We investigate a problem which arises in computational biology: Given a constant-size alpha- bet A with a weight function : A → N, nd an ecient data structure and query algorithm solving the following problem: For a string over A and a weight M ∈ N, decide whether contains a substring with weight M, where the weight of a string is the sum of the weights of its letters (ONE-STRING MASS FINDING PROBLEM). If the answer is yes, then we may in addition require a witness, i.e., indices i 6 j such that the substring beginning at position i and ending at position j has weight M. We allow preprocessing of the string and measure eciency in two parameters: storage space required for the preprocessed data and running time of the query al- gorithm for given M. We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length n of the input string . Among others, we present two non-trivial ecient algorithms: LOOKUP solves the problem with O(n) storage space and O(n=logn) time; INTERVAL solves the problem for binary alphabets with O(n) storage space in O(logn) query time. We introduce other variants of the problem and sketch how our algorithms may be extended for these variants. Finally, we discuss combinatorial properties of weighted strings.

Algorithmic Complexity of Protein Identification: Combinatorics of Weighted Strings

Mark Cieliebak;Thomas Erlebach;Liptak, Zsuzsanna;Jens Stoye;Emo Welzl

2004-01-01

Abstract

We investigate a problem which arises in computational biology: Given a constant-size alpha- bet A with a weight function : A → N, nd an ecient data structure and query algorithm solving the following problem: For a string over A and a weight M ∈ N, decide whether contains a substring with weight M, where the weight of a string is the sum of the weights of its letters (ONE-STRING MASS FINDING PROBLEM). If the answer is yes, then we may in addition require a witness, i.e., indices i 6 j such that the substring beginning at position i and ending at position j has weight M. We allow preprocessing of the string and measure eciency in two parameters: storage space required for the preprocessed data and running time of the query al- gorithm for given M. We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length n of the input string . Among others, we present two non-trivial ecient algorithms: LOOKUP solves the problem with O(n) storage space and O(n=logn) time; INTERVAL solves the problem for binary alphabets with O(n) storage space in O(logn) query time. We introduce other variants of the problem and sketch how our algorithms may be extended for these variants. Finally, we discuss combinatorial properties of weighted strings.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2004
		
	Parole chiave
	
			Computational Biology, Protein Identification, Weighted Strings
		
	Appare nelle tipologie:
	
			01.01 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
DAM_Cossac2004.pdf solo utenti autorizzati Tipologia: Versione dell'editore Licenza: Copyright dell'editore Dimensione 318.87 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	318.87 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/391086

Citazioni

ND

16

11

social impact