Given a string π = π[1..π] of length π, and integers π and π, such that π > π β₯ 2π > 0, we consider the problem of compressing the string π formed by concatenating the substrings of π of length π starting at positions π β‘ 1 (mod π). In particular, we provide an upper bound of (2π β π)/π + 2π§ + (π β π) on the size of the Lempel-Ziv (LZ77) parsing of π, where π§ is the size of the parsing of π. We also show that a related bound holds regardless of the order in which the substrings are concatenated in the formation of π. If π is viewed as a genome sequence, the above substring sampling process corresponds to an idealized model of short read DNA sequencing.
On Compressing Collections of Substring Samples
Sara Giuliani;Zsuzsanna Liptak;
2022-01-01
Abstract
Given a string π = π[1..π] of length π, and integers π and π, such that π > π β₯ 2π > 0, we consider the problem of compressing the string π formed by concatenating the substrings of π of length π starting at positions π β‘ 1 (mod π). In particular, we provide an upper bound of (2π β π)/π + 2π§ + (π β π) on the size of the Lempel-Ziv (LZ77) parsing of π, where π§ is the size of the parsing of π. We also show that a related bound holds regardless of the order in which the substrings are concatenated in the formation of π. If π is viewed as a genome sequence, the above substring sampling process corresponds to an idealized model of short read DNA sequencing.File | Dimensione | Formato | |
---|---|---|---|
ICTCS2022_1532.pdf
accesso aperto
Tipologia:
Versione dell'editore
Licenza:
Copyright dell'editore
Dimensione
1.5 MB
Formato
Adobe PDF
|
1.5 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.