The Burrows-Wheeler Transform (BWT) is a powerful transform widely used in string compression and string processing. It produces a reversible permutation of the characters of the input string, often allowing for easier compression of the string, while also enabling fast pattern matching. While the BWT is defined for every word, not every word is the BWT of some word. A characterization of BWT images was given in [Likhomanov and Shur, CSR, 2011], based on the standard permutation of the string. Often an end-of-file character dollar is added to mark the end of a string. Given a string 𝑤, it is an interesting combinatorial question where a $ can be inserted to make 𝑤 the BWT of some string 𝑣dollar. This question was answered in [Giuliani et al. Theor. Comput. Sci. 2021], where an efficient algorithm was presented for computing all such positions (called nice positions), and a characterization of nice positions was given, based on pseudo-cycles in the standard permutation of 𝑤. In this paper, we give a stronger characterization of nice positions: We show that these can be characterized using only essential pseudo-cycles, which constitute a small subset of all possible pseudo- cycles. We present an algorithm to compute all essential pseudo-cycles of a word 𝑤. In the second part of the paper, we study nice positions of fully clustered words: these are words 𝑤 whose number of runs equals the number of distinct characters occurring in 𝑤. Fully clustered words are of particular interest due to their extreme compressibility, and words whose BWT is fully clustered have been studied extensively. We are interested in the number of nice positions of fully clustered words, as well as in the number of fully clustered words with 𝑘 nice positions, for fixed 𝑘 and word length.

When a Dollar in a Fully Clustered Word Makes a BWT

Sara Giuliani;Zsuzsanna Liptak;Francesco Masillo
2022-01-01

Abstract

The Burrows-Wheeler Transform (BWT) is a powerful transform widely used in string compression and string processing. It produces a reversible permutation of the characters of the input string, often allowing for easier compression of the string, while also enabling fast pattern matching. While the BWT is defined for every word, not every word is the BWT of some word. A characterization of BWT images was given in [Likhomanov and Shur, CSR, 2011], based on the standard permutation of the string. Often an end-of-file character dollar is added to mark the end of a string. Given a string 𝑤, it is an interesting combinatorial question where a $ can be inserted to make 𝑤 the BWT of some string 𝑣dollar. This question was answered in [Giuliani et al. Theor. Comput. Sci. 2021], where an efficient algorithm was presented for computing all such positions (called nice positions), and a characterization of nice positions was given, based on pseudo-cycles in the standard permutation of 𝑤. In this paper, we give a stronger characterization of nice positions: We show that these can be characterized using only essential pseudo-cycles, which constitute a small subset of all possible pseudo- cycles. We present an algorithm to compute all essential pseudo-cycles of a word 𝑤. In the second part of the paper, we study nice positions of fully clustered words: these are words 𝑤 whose number of runs equals the number of distinct characters occurring in 𝑤. Fully clustered words are of particular interest due to their extreme compressibility, and words whose BWT is fully clustered have been studied extensively. We are interested in the number of nice positions of fully clustered words, as well as in the number of fully clustered words with 𝑘 nice positions, for fixed 𝑘 and word length.
2022
combinatorics on words, Burrows-Wheeler-Transform, permutations, string algorithms, fully clustered words
File in questo prodotto:
File Dimensione Formato  
ICTCS2022_4242.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Copyright dell'editore
Dimensione 1.27 MB
Formato Adobe PDF
1.27 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1091827
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact