The Burrows-Wheeler-Transform (BWT) is a reversible string transformation which plays a central role in text compression and is fundamental in many modern bioinformatics applications. The BWT is a permutation of the characters, which is in general better compressible and allows to answer several different query types more efficiently than the original string. It is easy to see that not every string is a BWT image, and exact characterizations of BWT images are known. We investigate a related combinatorial question. In many applications, a sentinel character $ is added to mark the end of the string, and thus the BWT of a string ending with $ contains exactly one $-character. Given a string w, we ask in which positions, if any, the $-character can be inserted to turn w into the BWT image of a word ending with $. We show that this depends only on the standard permutation of w and present a O(nlogn)-time algorithm for identifying all such positions, improving on the naive quadratic time algorithm. We also give a combinatorial characterization of such positions and develop bounds on their number and value. This is an extended version of [Giuliani et al. ICTCS 2019].

When a dollar makes a BWT

Giuliani, Sara;Lipták, Zsuzsanna;Masillo, Francesco;Rizzi, Romeo
2021-01-01

Abstract

The Burrows-Wheeler-Transform (BWT) is a reversible string transformation which plays a central role in text compression and is fundamental in many modern bioinformatics applications. The BWT is a permutation of the characters, which is in general better compressible and allows to answer several different query types more efficiently than the original string. It is easy to see that not every string is a BWT image, and exact characterizations of BWT images are known. We investigate a related combinatorial question. In many applications, a sentinel character $ is added to mark the end of the string, and thus the BWT of a string ending with $ contains exactly one $-character. Given a string w, we ask in which positions, if any, the $-character can be inserted to turn w into the BWT image of a word ending with $. We show that this depends only on the standard permutation of w and present a O(nlogn)-time algorithm for identifying all such positions, improving on the naive quadratic time algorithm. We also give a combinatorial characterization of such positions and develop bounds on their number and value. This is an extended version of [Giuliani et al. ICTCS 2019].
2021
combinatorics on words, Burrows-Wheeler-Transform, permutations, splay trees, efficient algorithms
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0304397521000207-main.pdf

solo utenti autorizzati

Tipologia: Versione dell'editore
Licenza: Copyright dell'editore
Dimensione 532.17 kB
Formato Adobe PDF
532.17 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1035251
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact