A tandem duplication is an operation that converts a string = S = A X B into a string =. T = A X X B . As they appear to be involved in genetic disorders, tandem duplications are widely studied in computational biology. Also, tandem duplication mechanisms have been recently studied in different contexts, from formal languages, to information theory, to error-correcting codes for DNA storage systems. The question of determining the complexity of computing the tandem duplication distance between two given strings was posed by (Leupold et al., 2004) and, very recently, the problem was shown to be NP-hard for the case of unbounded alphabets (Lafond et al., STACS 2020). In this paper, we significantly improve this result and show that the tandem duplication distance problem is NP-hard already for the case of strings over an alphabet of size 5. For a restricted class of strings, we establish the tractability of the existence problem: given strings S and T over the same alphabet, decide whether there exists a sequence of duplications converting S into T. A polynomial time algorithm that solves this existence problem was only known for the case of the binary alphabet.
The Tandem Duplication Distance Problem Is Hard over Bounded Alphabets
Ferdinando Cicalese;
2021-01-01
Abstract
A tandem duplication is an operation that converts a string = S = A X B into a string =. T = A X X B . As they appear to be involved in genetic disorders, tandem duplications are widely studied in computational biology. Also, tandem duplication mechanisms have been recently studied in different contexts, from formal languages, to information theory, to error-correcting codes for DNA storage systems. The question of determining the complexity of computing the tandem duplication distance between two given strings was posed by (Leupold et al., 2004) and, very recently, the problem was shown to be NP-hard for the case of unbounded alphabets (Lafond et al., STACS 2020). In this paper, we significantly improve this result and show that the tandem duplication distance problem is NP-hard already for the case of strings over an alphabet of size 5. For a restricted class of strings, we establish the tractability of the existence problem: given strings S and T over the same alphabet, decide whether there exists a sequence of duplications converting S into T. A polynomial time algorithm that solves this existence problem was only known for the case of the binary alphabet.File | Dimensione | Formato | |
---|---|---|---|
TandemDuplicationHardness-IWOCA2021-CameraReady.pdf
solo utenti autorizzati
Tipologia:
Documento in Pre-print
Licenza:
Accesso ristretto
Dimensione
380.21 kB
Formato
Adobe PDF
|
380.21 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.