A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Theta(log n), where n is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive Theta(log n)-factor. As regards the multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(log n log r), since here r = O(1). We then give examples of strings in which insertion, deletion, resp. substitution of a character increases r by a Theta(\sqrt(n)) additive factor. These strings significantly improve the best known lower bound for an additive factor of Omega(log n) [Giuliani et al., SOFSEM 2021].

Bit Catastrophes for the Burrows-Wheeler Transform

Lipták, Zsuzsanna;
2025-01-01

Abstract

A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Theta(log n), where n is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive Theta(log n)-factor. As regards the multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(log n log r), since here r = O(1). We then give examples of strings in which insertion, deletion, resp. substitution of a character increases r by a Theta(\sqrt(n)) additive factor. These strings significantly improve the best known lower bound for an additive factor of Omega(log n) [Giuliani et al., SOFSEM 2021].
2025
Burrows-Wheeler transform
Equal-letter run
Repetitiveness measure
Sensitivity
File in questo prodotto:
File Dimensione Formato  
s00224-024-10212-9.pdf

accesso aperto

Licenza: Creative commons
Dimensione 867.2 kB
Formato Adobe PDF
867.2 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1161651
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact