Multiple Sequence Alignment (MSA for short) is a well known problem in the field of computational biology. In order to evaluate the quality of a solution, many different scoring functions have been introduced, the most widely used being the Sum-of-Pairs score (SP-score). It is known that computing the best MSA under the SP-score measure is NP-hard.In this paper, we introduce a variant of the Column score (defined in Thompson et al. 1999), which we refer to as Selective Column score: Given a symbol a is an element of Sigma, the score of the i-th column is one if and only if all symbols of the same column are a, and otherwise zero. The a-column score of an alignment is then the number of columns made of only character a.We show that finding the optimal MSA under the Selective Column Score is NP-hard for all alphabets of size |Sigma| &gt;= 2, and that the associated maximization problem is poly-APX-hard. We also give an approximation algorithm that almost matches the inapproximability bound. (c) 2023 Elsevier B.V. All rights reserved.

### Hardness and approximation of multiple sequence alignment with column score

#### Abstract

Multiple Sequence Alignment (MSA for short) is a well known problem in the field of computational biology. In order to evaluate the quality of a solution, many different scoring functions have been introduced, the most widely used being the Sum-of-Pairs score (SP-score). It is known that computing the best MSA under the SP-score measure is NP-hard.In this paper, we introduce a variant of the Column score (defined in Thompson et al. 1999), which we refer to as Selective Column score: Given a symbol a is an element of Sigma, the score of the i-th column is one if and only if all symbols of the same column are a, and otherwise zero. The a-column score of an alignment is then the number of columns made of only character a.We show that finding the optimal MSA under the Selective Column Score is NP-hard for all alphabets of size |Sigma| >= 2, and that the associated maximization problem is poly-APX-hard. We also give an approximation algorithm that almost matches the inapproximability bound. (c) 2023 Elsevier B.V. All rights reserved.
##### Scheda breve Scheda completa Scheda completa (DC)
2023
Multiple sequence alignment
Column score
NP-completeness
File in questo prodotto:
File
1-s2.0-S0304397522007654-main.pdf

embargo fino al 01/01/2025

Descrizione: paper-postprint
Tipologia: Versione dell'editore
Utilizza questo identificativo per citare o creare un link a questo documento: `https://hdl.handle.net/11562/1093306`