CATALOGO DEI PRODOTTI DELLA RICERCA

Given a string s, the Parikh vector of s, denoted p(s), counts the multi- plicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t) = q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u,v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u ≤ q ≤ v. This defini- tion encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem in sub-linear expected time using a wavelet tree of s, which can be computed in time O(n) in a preprocessing phase. We then dis- cuss a Scrabble-like variation of the problem, in which a weight function on the letters of s is given and one has to find all occurrences in s of a substring t with maximum weight having Parikh vector p(t) ≤ v. For the case of a binary alphabet, we present an algorithm which solves the decision version of the Approximate Jumbled Pattern Matching problem in constant time, by indexing the string in subquadratic time.

On Approximate Jumbled Pattern Matching in Strings.

Péter Burcsi;Cicalese, Ferdinando;Gabriele Fici;Liptak, Zsuzsanna

2012-01-01

Abstract

Given a string s, the Parikh vector of s, denoted p(s), counts the multi- plicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t) = q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u,v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u ≤ q ≤ v. This defini- tion encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem in sub-linear expected time using a wavelet tree of s, which can be computed in time O(n) in a preprocessing phase. We then dis- cuss a Scrabble-like variation of the problem, in which a weight function on the letters of s is given and one has to find all occurrences in s of a substring t with maximum weight having Parikh vector p(t) ≤ v. For the case of a binary alphabet, we present an algorithm which solves the decision version of the Approximate Jumbled Pattern Matching problem in constant time, by indexing the string in subquadratic time.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2012
			
	Parole chiave
	
				String algorithms, Pattern matching, Parikh vectors, Average case analysis, Approximate search, Permuted strings
			
	Appare nelle tipologie:
	
				01.01 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
TOCS2012.pdf solo utenti autorizzati Tipologia: Versione dell'editore Licenza: Copyright dell'editore Dimensione 355.78 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	355.78 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/390644

Citazioni

ND

37

30

social impact