CATALOGO DEI PRODOTTI DELLA RICERCA

Multiple Sequence Alignment (MSA for short) is a well known problem in the field of computational biology. In order to evaluate the quality of a solution, many different scoring functions have been introduced, the most widely used being the Sum-of-Pairs score (SP-score). It is known that computing the best MSA under the SP-score measure is NP-hard.In this paper, we introduce a variant of the Column score (defined in Thompson et al. 1999), which we refer to as Selective Column score: Given a symbol a is an element of Sigma, the score of the i-th column is one if and only if all symbols of the same column are a, and otherwise zero. The a-column score of an alignment is then the number of columns made of only character a.We show that finding the optimal MSA under the Selective Column Score is NP-hard for all alphabets of size |Sigma| >= 2, and that the associated maximization problem is poly-APX-hard. We also give an approximation algorithm that almost matches the inapproximability bound. (c) 2023 Elsevier B.V. All rights reserved.

Hardness and approximation of multiple sequence alignment with column score

Caucchiolo, A;Cicalese, F

2023-01-01

Abstract

Multiple Sequence Alignment (MSA for short) is a well known problem in the field of computational biology. In order to evaluate the quality of a solution, many different scoring functions have been introduced, the most widely used being the Sum-of-Pairs score (SP-score). It is known that computing the best MSA under the SP-score measure is NP-hard.In this paper, we introduce a variant of the Column score (defined in Thompson et al. 1999), which we refer to as Selective Column score: Given a symbol a is an element of Sigma, the score of the i-th column is one if and only if all symbols of the same column are a, and otherwise zero. The a-column score of an alignment is then the number of columns made of only character a.We show that finding the optimal MSA under the Selective Column Score is NP-hard for all alphabets of size |Sigma| >= 2, and that the associated maximization problem is poly-APX-hard. We also give an approximation algorithm that almost matches the inapproximability bound. (c) 2023 Elsevier B.V. All rights reserved.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Parole chiave
	
				Multiple sequence alignment
Column score
NP-completeness
			
	Appare nelle tipologie:
	
				01.01 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0304397522007654-main.pdf Open Access dal 02/01/2025 Descrizione: paper-postprint Tipologia: Versione dell'editore Licenza: Copyright dell'editore Dimensione 329.81 kB Formato Adobe PDF Visualizza/Apri	329.81 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1093306

Citazioni

ND

2

1

social impact