The advent of long-read sequencing has enhanced our capability to characterize complex genomic regions harboring large structural variations, repetitive elements, abnormal GC content or highly homologous genes. The combination of long-read sequencing with enrichment strategies that allow to capture long fragments represents a valuable tool to reduce analysis costs while maximizing data production on a selected region of interest. Here we evaluated the features and performances of three different long-DNA capture approaches, comprising indirect sequence capture (Samplix’s Xdrop), Cas9-mediated targeted sequencing, and a set of three hybridization-capture methods (PNA, dCas9 and dsDNA-probes). The benefits of these approaches in combination with long-read sequencing were assessed for the analysis of FMR1, DMPK, and CNBP repeat expansions, selected as case studies for medium-long, long and ultra-long targets, respectively, and causative of congenital disorders. All methods resulted in successful enrichment of long DNA target molecules, even if the length of enriched DNA was variable across the different approaches. In particular the Cas9-mediated capture allowed to sequence up to 50 kbp molecules in length, and thus to characterize a repeat of 46.6 kbp, one of the longest achieved with target-enrichment approaches. Despite being the most sensitive approach to input gDNA quality, Cas9-mediated capture enabled also to achieve the highest fold-enrichment among the three methods tested. Using the Xdrop-mediated method, we could capture even longer DNA portions (100 kbp) than with Cas9, at just slightly lower enrichment level, even if these were fragmented in shorter pieces of 5-10 kbp in length. In addition, the Xdrop method allowed to work with the lowest gDNA input (10 ng), in contrast to the micrograms of gDNA required for the other methods, suggesting the potential of this approach to work with samples derived from pre-natal/pre-implant testing, clinical biopsies or even single cells. A drawback linked to the use of lower input was the need of Whole Genome Amplification downstream the enrichment step, that decreased the size of sequenced molecules and thus restrained the applicability of this method. Also, the Xdrop approach required the highest capital investments, as it depends on a specific droplet generator instrument and a cytometer for sorting. The hybridization-based approaches could potentially represent the most cost-effective solution, with costs ~10 times lower than Xdrop and Cas9. However, using these methods the downstream recovery of enriched DNA was so low that did not allow the subsequent sequencing analysis with long-reads. Even if hybridization-based approaches represent potentially interesting solutions, subsequent protocol optimization aimed at improving the yield of enriched-DNA are therefore required for their effective application. Xdrop- and Cas9-mediated workflows enabled successful ONT sequencing of patient genomic DNA harboring known repeat expansions. This demonstrated the capability of these approaches to capture and characterize significantly long (pathogenic) microsatellites, in high agreement with traditional diagnostic approaches. In addition, the methods allowed to achieve the accurate discrimination at single nucleotide resolution of normal and expanded alleles, with the simultaneous assessment of repeat length, structure/motif and level of somatic mosaicism, otherwise not feasible with traditional methods (used either alone or in combination). Application of these long-DNA capture approaches in the clinical setting could potentially improve patient diagnosis and provide more precise genotype-phenotype correlations, which are still lacking/limited for those disease characterized by large microsatellite expansions. In conclusion, the deep evaluation of strengths and weaknesses of long-DNA capture approaches – as described in this thesis – will promote their more widespread application for the characterization of pathogenic loci, only partially resolved using traditional approaches.

Evaluation and optimization of long-DNA capture approaches for the characterization of long microsatellites in repeat expansion disorders

Massimiliano Alfano
2022

Abstract

The advent of long-read sequencing has enhanced our capability to characterize complex genomic regions harboring large structural variations, repetitive elements, abnormal GC content or highly homologous genes. The combination of long-read sequencing with enrichment strategies that allow to capture long fragments represents a valuable tool to reduce analysis costs while maximizing data production on a selected region of interest. Here we evaluated the features and performances of three different long-DNA capture approaches, comprising indirect sequence capture (Samplix’s Xdrop), Cas9-mediated targeted sequencing, and a set of three hybridization-capture methods (PNA, dCas9 and dsDNA-probes). The benefits of these approaches in combination with long-read sequencing were assessed for the analysis of FMR1, DMPK, and CNBP repeat expansions, selected as case studies for medium-long, long and ultra-long targets, respectively, and causative of congenital disorders. All methods resulted in successful enrichment of long DNA target molecules, even if the length of enriched DNA was variable across the different approaches. In particular the Cas9-mediated capture allowed to sequence up to 50 kbp molecules in length, and thus to characterize a repeat of 46.6 kbp, one of the longest achieved with target-enrichment approaches. Despite being the most sensitive approach to input gDNA quality, Cas9-mediated capture enabled also to achieve the highest fold-enrichment among the three methods tested. Using the Xdrop-mediated method, we could capture even longer DNA portions (100 kbp) than with Cas9, at just slightly lower enrichment level, even if these were fragmented in shorter pieces of 5-10 kbp in length. In addition, the Xdrop method allowed to work with the lowest gDNA input (10 ng), in contrast to the micrograms of gDNA required for the other methods, suggesting the potential of this approach to work with samples derived from pre-natal/pre-implant testing, clinical biopsies or even single cells. A drawback linked to the use of lower input was the need of Whole Genome Amplification downstream the enrichment step, that decreased the size of sequenced molecules and thus restrained the applicability of this method. Also, the Xdrop approach required the highest capital investments, as it depends on a specific droplet generator instrument and a cytometer for sorting. The hybridization-based approaches could potentially represent the most cost-effective solution, with costs ~10 times lower than Xdrop and Cas9. However, using these methods the downstream recovery of enriched DNA was so low that did not allow the subsequent sequencing analysis with long-reads. Even if hybridization-based approaches represent potentially interesting solutions, subsequent protocol optimization aimed at improving the yield of enriched-DNA are therefore required for their effective application. Xdrop- and Cas9-mediated workflows enabled successful ONT sequencing of patient genomic DNA harboring known repeat expansions. This demonstrated the capability of these approaches to capture and characterize significantly long (pathogenic) microsatellites, in high agreement with traditional diagnostic approaches. In addition, the methods allowed to achieve the accurate discrimination at single nucleotide resolution of normal and expanded alleles, with the simultaneous assessment of repeat length, structure/motif and level of somatic mosaicism, otherwise not feasible with traditional methods (used either alone or in combination). Application of these long-DNA capture approaches in the clinical setting could potentially improve patient diagnosis and provide more precise genotype-phenotype correlations, which are still lacking/limited for those disease characterized by large microsatellite expansions. In conclusion, the deep evaluation of strengths and weaknesses of long-DNA capture approaches – as described in this thesis – will promote their more widespread application for the characterization of pathogenic loci, only partially resolved using traditional approaches.
Repeat expansion disorders, Targeted sequencing, Cas9-mediated capture, Xdrop, Probes, dCas9, PNA
File in questo prodotto:
File Dimensione Formato  
Documento_di_Tesi_MassimilianoAlfano.pdf

embargo fino al 26/07/2023

Descrizione: Tesi di Dottorato
Tipologia: Tesi di dottorato
Licenza: Accesso ristretto
Dimensione 6.92 MB
Formato Adobe PDF
6.92 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11562/1070826
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact