CATALOGO DEI PRODOTTI DELLA RICERCA

When building Burrows-Wheeler Transforms (BWTs) of truly huge datasets, prefix-free parsing (PFP) can use an unreasonable amount of memory. In this paper we show how if a dataset can be broken down into small datasets that are not very similar to each other—such as collections of many copies of genomes of each of several species, or collections of many copies of each of the human chromosomes—then we can drastically reduce PFP’s memory footprint by building the BWTs of the small datasets and then merging them into the BWT of the whole dataset.

Prefix-Free Parsing for Merging Big BWTs

Diego Díaz-Domínguez;Travis Gagie;Veronica Guerrini;Ben Langmead;Zsuzsanna Lipták;Giovanni Manzini;Francesco Masillo;Vikram Shivakumar

2025-01-01

Abstract

When building Burrows-Wheeler Transforms (BWTs) of truly huge datasets, prefix-free parsing (PFP) can use an unreasonable amount of memory. In this paper we show how if a dataset can be broken down into small datasets that are not very similar to each other—such as collections of many copies of genomes of each of several species, or collections of many copies of each of the human chromosomes—then we can drastically reduce PFP’s memory footprint by building the BWTs of the small datasets and then merging them into the BWT of the whole dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole Chiave
	
				Burrows-Wheeler Transform, Prefix-free parsing, Low-memory algorithms, Pangenomics
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1173176

Citazioni

ND

ND

ND

social impact