TheBurrows-Wheeler-Transform(BWT)isareversiblestring transformation which plays a central role in text compression and is fun- damental in many modern bioinformatics applications. The BWT is a permutation of the characters, which is in general better compressible and allows to answer several different query types more efficiently than the original string. It is easy to see that not every string is a BWT image, and exact charac- terizations of BWT images are known. We investigate a related combi- natorial question. In many applications, a sentinel character $ is added to mark the end of the string, and thus the BWT of a string ending with $ contains exactly one $ character. We ask, given a string w, in which positions, if any, can the $-character be inserted to turn w into the BWT image of a word ending with the sentinel character. We show that this depends only on the standard permutation of w and give a combinatorial characterization of such positions via this permutation. We then develop an O(n log n)-time algorithm for identifying all such positions, improving on the naive quadratic time algorithm.

When a Dollar Makes a BWT

Sara Giuliani;Zsuzsanna Lipták;Romeo Rizzi
2019-01-01

Abstract

TheBurrows-Wheeler-Transform(BWT)isareversiblestring transformation which plays a central role in text compression and is fun- damental in many modern bioinformatics applications. The BWT is a permutation of the characters, which is in general better compressible and allows to answer several different query types more efficiently than the original string. It is easy to see that not every string is a BWT image, and exact charac- terizations of BWT images are known. We investigate a related combi- natorial question. In many applications, a sentinel character $ is added to mark the end of the string, and thus the BWT of a string ending with $ contains exactly one $ character. We ask, given a string w, in which positions, if any, can the $-character be inserted to turn w into the BWT image of a word ending with the sentinel character. We show that this depends only on the standard permutation of w and give a combinatorial characterization of such positions via this permutation. We then develop an O(n log n)-time algorithm for identifying all such positions, improving on the naive quadratic time algorithm.
2019
combinatorics on words, Burrows-Wheeler-Transform, permutations, splay trees, efficient algorithms
File in questo prodotto:
File Dimensione Formato  
GiulianiLiptakRizzi_ICTCS2019.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 549.95 kB
Formato Adobe PDF
549.95 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1023388
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact