A binary de Bruijn sequence (dB sequence) of order k is a circular binary string that contains each k-length word exactly once as a substring. Most existing algorithms construct a specific dB sequence, or members of a specific class of dB sequences, representing only a tiny frac- tion of the complete set. The only algorithms capable of generating all dB sequences are based on finding Euler cycles in de Bruijn graphs. Here, we present an algorithm for constructing random binary dB sequences which uses the extended Burrows-Wheeler Transform. Our method is simple to implement (less than 120 lines of C++ code) and can produce random dB sequences of any order. Even though it does not output dB sequences uniformly at random, it provably outputs each dB sequence with positive probability. The algorithm runs in linear space and near-linear time in the length of the dB sequence and needs less than one second on a lap- top computer for orders up to 23, including outputting the sequence. It can be straightforwardly extended to any constant-size alphabet. To the best of our knowledge, this is the first practical algorithm for generating random dB sequences which is capable of producing all dB sequences. Apart from its immediate usefulness in contexts where it is desirable to use a dB sequence that cannot be guessed easily, we also demonstrate our algorithm’s potential in theoretical studies, giving hitherto unknown estimates of the average discrepancy of binary dB sequences. The code is availabe (in C++ and python) at https://github.com/lucaparmigiani/ rnd dbseq.

A BWT-based algorithm for random de Bruijn sequence construction

Zsuzsanna Liptak;
2024-01-01

Abstract

A binary de Bruijn sequence (dB sequence) of order k is a circular binary string that contains each k-length word exactly once as a substring. Most existing algorithms construct a specific dB sequence, or members of a specific class of dB sequences, representing only a tiny frac- tion of the complete set. The only algorithms capable of generating all dB sequences are based on finding Euler cycles in de Bruijn graphs. Here, we present an algorithm for constructing random binary dB sequences which uses the extended Burrows-Wheeler Transform. Our method is simple to implement (less than 120 lines of C++ code) and can produce random dB sequences of any order. Even though it does not output dB sequences uniformly at random, it provably outputs each dB sequence with positive probability. The algorithm runs in linear space and near-linear time in the length of the dB sequence and needs less than one second on a lap- top computer for orders up to 23, including outputting the sequence. It can be straightforwardly extended to any constant-size alphabet. To the best of our knowledge, this is the first practical algorithm for generating random dB sequences which is capable of producing all dB sequences. Apart from its immediate usefulness in contexts where it is desirable to use a dB sequence that cannot be guessed easily, we also demonstrate our algorithm’s potential in theoretical studies, giving hitherto unknown estimates of the average discrepancy of binary dB sequences. The code is availabe (in C++ and python) at https://github.com/lucaparmigiani/ rnd dbseq.
2024
De Bruijn sequence, Burrows-Wheeler Transform, extended BWT, random generation, spanning tree, standard permutation, Lyndon words
File in questo prodotto:
File Dimensione Formato  
545566_1_En_9_Chapter_Author.pdf

solo utenti autorizzati

Tipologia: Documento in Pre-print
Licenza: Copyright dell'editore
Dimensione 611.04 kB
Formato Adobe PDF
611.04 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1120694
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact