Genomic analyses are based on using a single reference genome that does not represent the whole intraspecies diversity. Instead, a pangenome contains the whole genome content of a species. State-of-art models for pangenome representations are divided into linear and nucleotide-based graph pangenomes. The linear model is composed by the reference genome with a set of non-representative reference (NRR) sequences, and it provides information about the presence of a gene in a certain cultivar through presence and absence analysis (PAV analysis). Nucleotide-based graph pangenome allows displaying of local similarities and dissimilarities of genomic regions. In this perspective, the two pangenome models are complementary to each other: the linear pangenome model creates a consensus genome reporting in a table representation the inter-individual gene presence and absence, whereas the nucleotide-based graph pangenome model displays graphically all possible nucleotide variations but cannot be used for gene presence and absence analysis. The aim of this thesis was to create a new pangenome model, called element-based graph pangenome in which it is possible to graphically represent the genes and perform the analysis of the presence and absence of the reported genes. Briefly, genes were annotated in the genomes of 5 different cultivars of Phaseolus vulgaris, through automatic gene prediction. Genes representing nodes in the graph were linked only if they were adjacent in the genome. Then, orthologous genes between different cultivars were identified and merged to represent a single node in the graph. Developed element-based graph pangenome was compared to linear and nucleotide based graph pangenome applied to the same bean accessions. Results showed that due to the merging of gene copies derived by duplication event (paralogs), fewer genes were reported in element-based graph pangenome compared to linear model. Moreover, the visualization of regions was much clearer than that of nucleotide-based graph model, since only the presence or absence of genes across different cultivars was displayed. Different from other pangenome models, element based graph pangenomes provided the advantage of moving information from the gene to the nucleotide in a” zoom-in” visualization, displaying local nucleotide similarities and dissimilarities.

Element-based graph pangenome

LOPATRIELLO GIULIA
2023-01-01

Abstract

Genomic analyses are based on using a single reference genome that does not represent the whole intraspecies diversity. Instead, a pangenome contains the whole genome content of a species. State-of-art models for pangenome representations are divided into linear and nucleotide-based graph pangenomes. The linear model is composed by the reference genome with a set of non-representative reference (NRR) sequences, and it provides information about the presence of a gene in a certain cultivar through presence and absence analysis (PAV analysis). Nucleotide-based graph pangenome allows displaying of local similarities and dissimilarities of genomic regions. In this perspective, the two pangenome models are complementary to each other: the linear pangenome model creates a consensus genome reporting in a table representation the inter-individual gene presence and absence, whereas the nucleotide-based graph pangenome model displays graphically all possible nucleotide variations but cannot be used for gene presence and absence analysis. The aim of this thesis was to create a new pangenome model, called element-based graph pangenome in which it is possible to graphically represent the genes and perform the analysis of the presence and absence of the reported genes. Briefly, genes were annotated in the genomes of 5 different cultivars of Phaseolus vulgaris, through automatic gene prediction. Genes representing nodes in the graph were linked only if they were adjacent in the genome. Then, orthologous genes between different cultivars were identified and merged to represent a single node in the graph. Developed element-based graph pangenome was compared to linear and nucleotide based graph pangenome applied to the same bean accessions. Results showed that due to the merging of gene copies derived by duplication event (paralogs), fewer genes were reported in element-based graph pangenome compared to linear model. Moreover, the visualization of regions was much clearer than that of nucleotide-based graph model, since only the presence or absence of genes across different cultivars was displayed. Different from other pangenome models, element based graph pangenomes provided the advantage of moving information from the gene to the nucleotide in a” zoom-in” visualization, displaying local nucleotide similarities and dissimilarities.
2023
PANGENOME
File in questo prodotto:
File Dimensione Formato  
PhD_thesis_Giulia_Lopatriello.pdf

embargo fino al 03/03/2024

Descrizione: Tesi di dottorato
Tipologia: Tesi di dottorato
Licenza: Copyright dell'editore
Dimensione 4.36 MB
Formato Adobe PDF
4.36 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1115007
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact