Multiple Sequence Alignment (MSA for short) is a well known problem in the field of computational biology. In order to evaluate the quality of a solution, many different scoring functions have been introduced, the most widely used being the Sum-of-Pairs score (SP-score). It is known that computing the best MSA under the SP-score measure is NP-hard.In this paper, we introduce a variant of the Column score (defined in Thompson et al. 1999), which we refer to as Selective Column score: Given a symbol a is an element of Sigma, the score of the i-th column is one if and only if all symbols of the same column are a, and otherwise zero. The a-column score of an alignment is then the number of columns made of only character a.We show that finding the optimal MSA under the Selective Column Score is NP-hard for all alphabets of size |Sigma| >= 2, and that the associated maximization problem is poly-APX-hard. We also give an approximation algorithm that almost matches the inapproximability bound. (c) 2023 Elsevier B.V. All rights reserved.

Hardness and approximation of multiple sequence alignment with column score

Caucchiolo, A
;
Cicalese, F
2023-01-01

Abstract

Multiple Sequence Alignment (MSA for short) is a well known problem in the field of computational biology. In order to evaluate the quality of a solution, many different scoring functions have been introduced, the most widely used being the Sum-of-Pairs score (SP-score). It is known that computing the best MSA under the SP-score measure is NP-hard.In this paper, we introduce a variant of the Column score (defined in Thompson et al. 1999), which we refer to as Selective Column score: Given a symbol a is an element of Sigma, the score of the i-th column is one if and only if all symbols of the same column are a, and otherwise zero. The a-column score of an alignment is then the number of columns made of only character a.We show that finding the optimal MSA under the Selective Column Score is NP-hard for all alphabets of size |Sigma| >= 2, and that the associated maximization problem is poly-APX-hard. We also give an approximation algorithm that almost matches the inapproximability bound. (c) 2023 Elsevier B.V. All rights reserved.
2023
Multiple sequence alignment
Column score
NP-completeness
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0304397522007654-main.pdf

embargo fino al 01/01/2025

Descrizione: paper-postprint
Tipologia: Versione dell'editore
Licenza: Copyright dell'editore
Dimensione 329.81 kB
Formato Adobe PDF
329.81 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1093306
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact