Motivation: Genome sequences uploaded and stored on public repositories are of crucial importance for a plethora of microbial analyses, among them: genomes comparison, phylogenetic analysis and species identification. In particular, microbial genomic sequences are uploaded by laboratories all around the world and metadata input is left to the discretion of the submitter. Based on our experience, we want to raise attention to the fact that metadata is not always fully correct – even for relevant information like taxonomical classification, which could still prove to be difficult for some microorganisms in the wet-lab. Therefore, we encourage researchers to perform some pre-analytical quality control steps before running downstream analyses. Methods: Klebsiella michiganensis and Achromobacter xylosoxidans complete assembly sequences were downloaded from NCBI RefSeq (respectively n=350 downloaded on 05-11-2021, and n=142 downloaded on 13-11-2019). Average Nucleotide Identity (ANI), which is a measure of nucleotide-level genomic similarity between the coding regions of two genomes, was calculated using fastANI v1.33 tool. Genomes showing ANI>=95% were considered as belonging to the same species. R version 4.1.1, pheatmap and colorspace R packages were used for ANI analyses results visualization purposes. Results: We performed an in silico taxonomical analysis of publicly available microbial genomes belonging to two bacterial species: Klebsiella michiganensis and Achromobacter xylosoxidans. We chose to perform such analysis focusing on these species since they both are opportunistic pathogens – i.e. microorganisms that do not usually infect healthy hosts, but establish infections in immunodepressed individuals or patients with underlying diseases – that contribute to the spread of antibiotic resistance in nosocomial settings. The analysis results (Figure 1) indicate that 9% of K. michiganensis genomes (n=31/350) were misclassified as belonging to this species, in fact they showed ANI<95% when compared to all the other genomic sequences. Moreover, as regards A. xylosoxidans, we found that 62% of genomes (n=88/142) resulted as misclassified. In conclusion, we found discrepancies among the reported taxonomic classification at species level and ANI results, suggesting a probable misclassification of some microorganisms that might compromise downstream analyses. In light of these results, we strongly suggest to perform pre-analytical quality control steps to ensure the correctness of taxonomical information. Examples of such quality controls are ANI calculation and 16S rRNA sequence analysis (same species if identity>98%) for species evaluation, or in silico multi-locus sequence typing (also known as MLST, MLST schemes for each species are available at https://pubmlst.org) for sequence type classification.

Microbial genomes metadata on public repositories: friends or foes?

Veschetti Laura
;
Malerba Giovanni
2022

Abstract

Motivation: Genome sequences uploaded and stored on public repositories are of crucial importance for a plethora of microbial analyses, among them: genomes comparison, phylogenetic analysis and species identification. In particular, microbial genomic sequences are uploaded by laboratories all around the world and metadata input is left to the discretion of the submitter. Based on our experience, we want to raise attention to the fact that metadata is not always fully correct – even for relevant information like taxonomical classification, which could still prove to be difficult for some microorganisms in the wet-lab. Therefore, we encourage researchers to perform some pre-analytical quality control steps before running downstream analyses. Methods: Klebsiella michiganensis and Achromobacter xylosoxidans complete assembly sequences were downloaded from NCBI RefSeq (respectively n=350 downloaded on 05-11-2021, and n=142 downloaded on 13-11-2019). Average Nucleotide Identity (ANI), which is a measure of nucleotide-level genomic similarity between the coding regions of two genomes, was calculated using fastANI v1.33 tool. Genomes showing ANI>=95% were considered as belonging to the same species. R version 4.1.1, pheatmap and colorspace R packages were used for ANI analyses results visualization purposes. Results: We performed an in silico taxonomical analysis of publicly available microbial genomes belonging to two bacterial species: Klebsiella michiganensis and Achromobacter xylosoxidans. We chose to perform such analysis focusing on these species since they both are opportunistic pathogens – i.e. microorganisms that do not usually infect healthy hosts, but establish infections in immunodepressed individuals or patients with underlying diseases – that contribute to the spread of antibiotic resistance in nosocomial settings. The analysis results (Figure 1) indicate that 9% of K. michiganensis genomes (n=31/350) were misclassified as belonging to this species, in fact they showed ANI<95% when compared to all the other genomic sequences. Moreover, as regards A. xylosoxidans, we found that 62% of genomes (n=88/142) resulted as misclassified. In conclusion, we found discrepancies among the reported taxonomic classification at species level and ANI results, suggesting a probable misclassification of some microorganisms that might compromise downstream analyses. In light of these results, we strongly suggest to perform pre-analytical quality control steps to ensure the correctness of taxonomical information. Examples of such quality controls are ANI calculation and 16S rRNA sequence analysis (same species if identity>98%) for species evaluation, or in silico multi-locus sequence typing (also known as MLST, MLST schemes for each species are available at https://pubmlst.org) for sequence type classification.
Bioinformatics, Microbial genomics, Genomes metadata
File in questo prodotto:
File Dimensione Formato  
20220621_BITS2022.pdf

solo utenti autorizzati

Tipologia: Altro materiale allegato
Licenza: Non specificato
Dimensione 4.43 MB
Formato Adobe PDF
4.43 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1069766
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact