CATALOGO DEI PRODOTTI DELLA RICERCA

Over the past decade, sequencing read length has increased from tens to hundreds and then to thousands of bases. Current cDNA synthesis methods prevent RNA-seq reads from being long enough to entirely capture all the RNA transcripts, but long reads can still provide connectivity information on chains of multiple exons that are included in transcripts. We demonstrate that exploiting full connectivity information leads to significantly higher prediction accuracy, as measured by the F-score. For this purpose we implemented the solution to the Minimum Path Cover with Subpath Constraints problem introduced in (Rizzi et al., 2014), which is an extension of the classical Minimum Path Cover problem and was shown solvable by min-cost flows. We show that, under hypothetical conditions of perfect sequencing, our approach is able to use long reads more effectively than two state-of-the-art tools, StringTie and FlipFlop. Even in this setting the problem is not trivial, and errors in the underlying flow graph introduced by sequencing and alignment errors complicate the problem further. As such our work also demonstrates the need for a development of a good spliced read aligner for long reads. Our proof-of-concept implementation is available at http://www.cs.helsinki.fi/en/gsa/traphlor. Copyright © 2016 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.

On using longer RNA-seq reads to improve transcript prediction accuracy

Kuosmanen, A.;Sobih, A.;RIZZI, ROMEO;Mäkinen, V.;Tomescu, A.

2016-01-01

Abstract

Over the past decade, sequencing read length has increased from tens to hundreds and then to thousands of bases. Current cDNA synthesis methods prevent RNA-seq reads from being long enough to entirely capture all the RNA transcripts, but long reads can still provide connectivity information on chains of multiple exons that are included in transcripts. We demonstrate that exploiting full connectivity information leads to significantly higher prediction accuracy, as measured by the F-score. For this purpose we implemented the solution to the Minimum Path Cover with Subpath Constraints problem introduced in (Rizzi et al., 2014), which is an extension of the classical Minimum Path Cover problem and was shown solvable by min-cost flows. We show that, under hypothetical conditions of perfect sequencing, our approach is able to use long reads more effectively than two state-of-the-art tools, StringTie and FlipFlop. Even in this setting the problem is not trivial, and errors in the underlying flow graph introduced by sequencing and alignment errors complicate the problem further. As such our work also demonstrates the need for a development of a good spliced read aligner for long reads. Our proof-of-concept implementation is available at http://www.cs.helsinki.fi/en/gsa/traphlor. Copyright © 2016 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Codice ISBN degli atti del congresso
	
				9789897581700
			
	Parole Chiave
	
				Biomedical engineering;  Flow graphs;  Forecasting;  RNA, Connectivity information;  Constraints problems;  Full connectivities;  Long reads;  Network flows;  Path cover;  Prediction accuracy;  Splicing graphs, Bioinformatics; Long reads;  Minimum Path Cover;  Network flow;  RNA-seq;  Splicing graph;  Transcript prediction
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
BIOINFORMATICS_2016_49.pdf solo utenti autorizzati Tipologia: Versione dell'editore Licenza: Accesso ristretto Dimensione 192.87 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	192.87 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/955001

Citazioni

ND

8

ND

social impact