CATALOGO DEI PRODOTTI DELLA RICERCA

The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1,...,aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1,...,pσ), where pi =|{j|sj =ai}|.Parikhvectorqoccursinsifshasasubstringtwithp(t)=q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time.The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time.The second algorithm finds all occurrences of a given Parikh vector in a text over anarbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More pre-cisely, we present two variants of the algorithm, both using an O(n) size data structure,each of which can be constructed in O(n) time. The first solution is very simple and σ 1/2 log measy to implement and leads to an expected query time of O(n( log σ ) √m ), where m = i qi is the length of a string with Parikh vector q. The second uses wavelet treesσ 1/2 1 and improves the expected runtime to O(n( log σ ) √m ), i.e., by a factor of log m.

Algorithms for Jumbled Pattern Matching in Strings

Péter Burcsi;Cicalese, Ferdinando;Gabriele Fici;Liptak, Zsuzsanna

2012-01-01

Abstract

The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1,...,aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1,...,pσ), where pi =|{j|sj =ai}|.Parikhvectorqoccursinsifshasasubstringtwithp(t)=q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time.The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time.The second algorithm finds all occurrences of a given Parikh vector in a text over anarbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More pre-cisely, we present two variants of the algorithm, both using an O(n) size data structure,each of which can be constructed in O(n) time. The first solution is very simple and σ 1/2 log measy to implement and leads to an expected query time of O(n( log σ ) √m ), where m = i qi is the length of a string with Parikh vector q. The second uses wavelet treesσ 1/2 1 and improves the expected runtime to O(n( log σ ) √m ), i.e., by a factor of log m.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2012
			
	Parole chiave
	
				Parikh vectors; permuted strings; pattern matching; string algorithms; average case analysis
			
	Appare nelle tipologie:
	
				01.01 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
IJFCS2012.pdf solo utenti autorizzati Tipologia: Versione dell'editore Licenza: Copyright dell'editore Dimensione 781 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	781 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/391083

Citazioni

ND

58

48

social impact