A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Θ(log n), where n is the length of the string. These strings can be interpreted both as ex- amples for an increase by a multiplicative or an additive Θ(log n)-factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(lognlogr), since here r = O(1). We then give examples of strings in which insertion, deletion, resp. sub- stitution of a character increases r by a Θ(√n) additive factor. These strings significantly improve the best known lower bound for an additive factor of Ω(log n) [Giuliani et al., SOFSEM 2021].
Bit Catastrophes for the Burrows-Wheeler Transform
Sara Giuliani;Zsuzsanna Liptak;
2023-01-01
Abstract
A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Θ(log n), where n is the length of the string. These strings can be interpreted both as ex- amples for an increase by a multiplicative or an additive Θ(log n)-factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(lognlogr), since here r = O(1). We then give examples of strings in which insertion, deletion, resp. sub- stitution of a character increases r by a Θ(√n) additive factor. These strings significantly improve the best known lower bound for an additive factor of Ω(log n) [Giuliani et al., SOFSEM 2021].File | Dimensione | Formato | |
---|---|---|---|
978-3-031-33264-7_8.pdf
solo utenti autorizzati
Tipologia:
Versione dell'editore
Licenza:
Copyright dell'editore
Dimensione
473.64 kB
Formato
Adobe PDF
|
473.64 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
978-3-031-33264-7_8.pdf
solo utenti autorizzati
Licenza:
Copyright dell'editore
Dimensione
473.64 kB
Formato
Adobe PDF
|
473.64 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.