We study the complexity of constructing an optimal parsing φ of a string s = s1...sn under the constraint that given a position p in the original text, and the LZ76-like (Lempel Ziv 76) encoding of T based on φ, it is possible to identify/decompress the character sp by performing at most c accesses to the LZ encoding, for a given integer c. We refer to such a parsing φ as a c-bounded access LZ parsing or c-BLZ parsing of s. We show that for any constant c the problem of computing the optimal c-BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX-hard, i.e., no PT AS can exist under the standard complexity assumption P ̸= NP. We also study the ratio between the sizes of an optimal c-BLZ parsing of a string s and an optimal LZ76 parsing of s (which can be greedily computed in polynomial time). For this we establish a non-trivial lower bound Ω(c+1 √|s|) on the size of an optimal parsing for a square free string s, and also show that such a lower bound is tight for a large class of (square free) morphic words. 1 Finally, after showing that under ETH, every algorithm for c-BLZ requires time 2Ω(|s| c ) , hence strongly exponential in the special case c = 1, we show an algorithm matching this bound that can compute an optimal 1-BLZ parsing of a string s in time O∗(1.755|s|)

Hardness and approximability of bounded access Lempel Ziv coding

Cicalese, Ferdinando;
2025-01-01

Abstract

We study the complexity of constructing an optimal parsing φ of a string s = s1...sn under the constraint that given a position p in the original text, and the LZ76-like (Lempel Ziv 76) encoding of T based on φ, it is possible to identify/decompress the character sp by performing at most c accesses to the LZ encoding, for a given integer c. We refer to such a parsing φ as a c-bounded access LZ parsing or c-BLZ parsing of s. We show that for any constant c the problem of computing the optimal c-BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX-hard, i.e., no PT AS can exist under the standard complexity assumption P ̸= NP. We also study the ratio between the sizes of an optimal c-BLZ parsing of a string s and an optimal LZ76 parsing of s (which can be greedily computed in polynomial time). For this we establish a non-trivial lower bound Ω(c+1 √|s|) on the size of an optimal parsing for a square free string s, and also show that such a lower bound is tight for a large class of (square free) morphic words. 1 Finally, after showing that under ETH, every algorithm for c-BLZ requires time 2Ω(|s| c ) , hence strongly exponential in the special case c = 1, we show an algorithm matching this bound that can compute an optimal 1-BLZ parsing of a string s in time O∗(1.755|s|)
2025
Lempel-Ziv compression NP-completeness APX-hardness Algorithms
File in questo prodotto:
File Dimensione Formato  
BLZ-SpecialIssue-revision 2.pdf

accesso aperto

Descrizione: BLZ pdf preprint
Tipologia: Documento in Pre-print
Licenza: Copyright dell'editore
Dimensione 450 kB
Formato Adobe PDF
450 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1175969
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact