The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1,...,aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1,...,pσ), where pi =|{j|sj =ai}|.Parikhvectorqoccursinsifshasasubstringtwithp(t)=q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time.The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time.The second algorithm finds all occurrences of a given Parikh vector in a text over anarbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More pre-cisely, we present two variants of the algorithm, both using an O(n) size data structure,each of which can be constructed in O(n) time. The first solution is very simple and σ 1/2 log measy to implement and leads to an expected query time of O(n( log σ ) √m ), where m = i qi is the length of a string with Parikh vector q. The second uses wavelet treesσ 1/2 1 and improves the expected runtime to O(n( log σ ) √m ), i.e., by a factor of log m.

Algorithms for Jumbled Pattern Matching in Strings

Cicalese, Ferdinando;Liptak, Zsuzsanna
2012-01-01

Abstract

The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1,...,aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1,...,pσ), where pi =|{j|sj =ai}|.Parikhvectorqoccursinsifshasasubstringtwithp(t)=q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time.The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time.The second algorithm finds all occurrences of a given Parikh vector in a text over anarbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More pre-cisely, we present two variants of the algorithm, both using an O(n) size data structure,each of which can be constructed in O(n) time. The first solution is very simple and σ 1/2 log measy to implement and leads to an expected query time of O(n( log σ ) √m ), where m = i qi is the length of a string with Parikh vector q. The second uses wavelet treesσ 1/2 1 and improves the expected runtime to O(n( log σ ) √m ), i.e., by a factor of log m.
2012
Parikh vectors; permuted strings; pattern matching; string algorithms; average case analysis
File in questo prodotto:
File Dimensione Formato  
IJFCS2012.pdf

solo utenti autorizzati

Tipologia: Versione dell'editore
Licenza: Copyright dell'editore
Dimensione 781 kB
Formato Adobe PDF
781 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/391083
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 55
  • ???jsp.display-item.citation.isi??? 39
social impact