Element-based graph pangenome

Lopatriello, Giulia

Genomic analyses are based on using a single reference genome that does not represent the whole intraspecies diversity. Instead, a pangenome contains the whole genome content of a species. State-of-art models for pangenome representations are divided into linear and nucleotide-based graph pangenomes. The linear model is composed by the reference genome with a set of non-representative reference (NRR) sequences, and it provides information about the presence of a gene in a certain cultivar through presence and absence analysis (PAV analysis). Nucleotide-based graph pangenome allows displaying of local similarities and dissimilarities of genomic regions. In this perspective, the two pangenome models are complementary to each other: the linear pangenome model creates a consensus genome reporting in a table representation the inter-individual gene presence and absence, whereas the nucleotide-based graph pangenome model displays graphically all possible nucleotide variations but cannot be used for gene presence and absence analysis. The aim of this thesis was to create a new pangenome model, called element-based graph pangenome in which it is possible to graphically represent the genes and perform the analysis of the presence and absence of the reported genes. Briefly, genes were annotated in the genomes of 5 different cultivars of Phaseolus vulgaris, through automatic gene prediction. Genes representing nodes in the graph were linked only if they were adjacent in the genome. Then, orthologous genes between different cultivars were identified and merged to represent a single node in the graph. Developed element-based graph pangenome was compared to linear and nucleotide based graph pangenome applied to the same bean accessions. Results showed that due to the merging of gene copies derived by duplication event (paralogs), fewer genes were reported in element-based graph pangenome compared to linear model. Moreover, the visualization of regions was much clearer than that of nucleotide-based graph model, since only the presence or absence of genes across different cultivars was displayed. Different from other pangenome models, element based graph pangenomes provided the advantage of moving information from the gene to the nucleotide in a” zoom-in” visualization, displaying local nucleotide similarities and dissimilarities.

CATALOGO DEI PRODOTTI DELLA RICERCA