A tandem duplication is an operation that converts a string = S = A X B into a string =. T = A X X B . As they appear to be involved in genetic disorders, tandem duplications are widely studied in computational biology. Also, tandem duplication mechanisms have been recently studied in different contexts, from formal languages, to information theory, to error-correcting codes for DNA storage systems. The question of determining the complexity of computing the tandem duplication distance between two given strings was posed by (Leupold et al., 2004) and, very recently, the problem was shown to be NP-hard for the case of unbounded alphabets (Lafond et al., STACS 2020). In this paper, we significantly improve this result and show that the tandem duplication distance problem is NP-hard already for the case of strings over an alphabet of size 5. For a restricted class of strings, we establish the tractability of the existence problem: given strings S and T over the same alphabet, decide whether there exists a sequence of duplications converting S into T. A polynomial time algorithm that solves this existence problem was only known for the case of the binary alphabet.

The Tandem Duplication Distance Problem Is Hard over Bounded Alphabets

Ferdinando Cicalese;
2021-01-01

Abstract

A tandem duplication is an operation that converts a string = S = A X B into a string =. T = A X X B . As they appear to be involved in genetic disorders, tandem duplications are widely studied in computational biology. Also, tandem duplication mechanisms have been recently studied in different contexts, from formal languages, to information theory, to error-correcting codes for DNA storage systems. The question of determining the complexity of computing the tandem duplication distance between two given strings was posed by (Leupold et al., 2004) and, very recently, the problem was shown to be NP-hard for the case of unbounded alphabets (Lafond et al., STACS 2020). In this paper, we significantly improve this result and show that the tandem duplication distance problem is NP-hard already for the case of strings over an alphabet of size 5. For a restricted class of strings, we establish the tractability of the existence problem: given strings S and T over the same alphabet, decide whether there exists a sequence of duplications converting S into T. A polynomial time algorithm that solves this existence problem was only known for the case of the binary alphabet.
978-3-030-79986-1
Tandem duplications, np-completeness
File in questo prodotto:
File Dimensione Formato  
TandemDuplicationHardness-IWOCA2021-CameraReady.pdf

solo utenti autorizzati

Tipologia: Documento in Pre-print
Licenza: Accesso ristretto
Dimensione 380.21 kB
Formato Adobe PDF
380.21 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1063374
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact