The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1,...,aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1,...,pσ), where pi =|{j|sj =ai}|.Parikhvectorqoccursinsifshasasubstringtwithp(t)=q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time.The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time.The second algorithm finds all occurrences of a given Parikh vector in a text over anarbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More pre-cisely, we present two variants of the algorithm, both using an O(n) size data structure,each of which can be constructed in O(n) time. The first solution is very simple and σ 1/2 log measy to implement and leads to an expected query time of O(n( log σ ) √m ), where m = i qi is the length of a string with Parikh vector q. The second uses wavelet treesσ 1/2 1 and improves the expected runtime to O(n( log σ ) √m ), i.e., by a factor of log m.

### Algorithms for Jumbled Pattern Matching in Strings

#### Abstract

The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1,...,aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1,...,pσ), where pi =|{j|sj =ai}|.Parikhvectorqoccursinsifshasasubstringtwithp(t)=q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time.The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time.The second algorithm finds all occurrences of a given Parikh vector in a text over anarbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More pre-cisely, we present two variants of the algorithm, both using an O(n) size data structure,each of which can be constructed in O(n) time. The first solution is very simple and σ 1/2 log measy to implement and leads to an expected query time of O(n( log σ ) √m ), where m = i qi is the length of a string with Parikh vector q. The second uses wavelet treesσ 1/2 1 and improves the expected runtime to O(n( log σ ) √m ), i.e., by a factor of log m.
##### Scheda breve Scheda completa Scheda completa (DC)
2012
Parikh vectors; permuted strings; pattern matching; string algorithms; average case analysis
File in questo prodotto:
File
IJFCS2012.pdf

solo utenti autorizzati

Tipologia: Versione dell'editore
Utilizza questo identificativo per citare o creare un link a questo documento: `https://hdl.handle.net/11562/391083`