Microbiome research has evolved rapidly and became an important topic over the years. The study of the microbiome permits us to investigate bacteria living in specific body sites and to understand its role in host health and disease conditions, observing the structure, function, and interaction between the different compounds. The amplicon sequencing is the most widely used technique to get the diversity composition of the microbiota. In particular, the 16S rRNA gene is the commonly adopted marker gene to identify microbial communities within the human host, collecting information on their relative abundance. In this elaborate we proposed a bioinformatic analysis of the 16S rRNA gene, performing a taxonomical classification of bacteria residing in the human oral cavity, and simulating a microbiome environment, dissecting, and pointing out what are the weak points of an analysis carried out by amplicons sequencing. To detect the microbial composition in 4 oral healthy individuals, the analysis was performed across the 16S rRNA gene amplicon regions by sequencing 6 amplicons involving 2 or 3 consecutive 16S rRNA hypervariable regions. We used the Divisive Amplicon Denoising Algorithm 2 (DADA2) able to infer the bacteria composition of samples by the detection of Amplicon Sequence Variants (ASVs), offering a single nucleotide resolution study. The public extended Human Oral Microbiota Database (eHOMD) was adopted as reference database to assign the taxonomy to the bacteria sequences. A comparison between all the 16S rRNA gene amplicon region was performed to identify which amplicon region better identified microbial communities. The second and most relevant part of the study was based on the results obtained from the 16S rRNA gene amplicon regions analysis used as reference panel. We developed a 16S rRNA gene-based simulator to mimic a marker gene sequencing process, evaluating what might be the pitfalls that could be encountered by carrying out a targeted metagenomic analysis. We selected the full-length 16S rRNA gene sequences from the eHOMD database and we subdivided the sequences into different amplicon regions. Then, a taxonomic classification was carried out to assign a taxonomic rank to the simulated amplicons. Taxonomically unassigned bacterial species underwent to multiple sequence alignment combined with a phylogenetic tree construction against all full-length rRNA gene reference sequences of the eHOMD belonging to the possible Species solutions to investigate the sequence identity between them. Our data reported an improved 16S analysis run with DADA2 using the pooled marker gene amplicon regions in which we identified 204 different bacterial species. Overall, the analysis on the unique ASV counts of the pooled 16S rRNA gene reported that approximately 44% of total ASVs achieved the most detailed species-level classification. However, the amplicon region V2V3 proved to be the portion of the 16S rRNA that recognized the highest number of bacteria at the species taxonomic level (135) through the single amplicon analysis. The simulation process, involving the extraction of amplicon sequences from the full-length 16S rRNA gene sequences of simulated bacterial sequences, showed that not all the amplicons can be extracted, suggesting, several 16S rRNA gene amplicon sequences were not amplified, as often happens in the real world. Furthermore, several bacterial species appear to have a high degree of sequence similarity in a given region of the 16S rRNA gene, making classification incomplete in some cases. The use of the simulator will then be useful to understand the sensitivity of the different regions of the 16S rRNA gene in recognizing the presence of pathogenic bacterial species. Our study can then be adopted to build up an accurate catalog of oral ASVs to essay large sample sets (e.g., hundreds of individuals), aiming at improving the taxonomy classification performances.

Pitfalls in bacteria recognition: a 16S rRNA gene sequencing-based simulation study on the oral microbiome environment

Locatelli, Elena
2023-01-01

Abstract

Microbiome research has evolved rapidly and became an important topic over the years. The study of the microbiome permits us to investigate bacteria living in specific body sites and to understand its role in host health and disease conditions, observing the structure, function, and interaction between the different compounds. The amplicon sequencing is the most widely used technique to get the diversity composition of the microbiota. In particular, the 16S rRNA gene is the commonly adopted marker gene to identify microbial communities within the human host, collecting information on their relative abundance. In this elaborate we proposed a bioinformatic analysis of the 16S rRNA gene, performing a taxonomical classification of bacteria residing in the human oral cavity, and simulating a microbiome environment, dissecting, and pointing out what are the weak points of an analysis carried out by amplicons sequencing. To detect the microbial composition in 4 oral healthy individuals, the analysis was performed across the 16S rRNA gene amplicon regions by sequencing 6 amplicons involving 2 or 3 consecutive 16S rRNA hypervariable regions. We used the Divisive Amplicon Denoising Algorithm 2 (DADA2) able to infer the bacteria composition of samples by the detection of Amplicon Sequence Variants (ASVs), offering a single nucleotide resolution study. The public extended Human Oral Microbiota Database (eHOMD) was adopted as reference database to assign the taxonomy to the bacteria sequences. A comparison between all the 16S rRNA gene amplicon region was performed to identify which amplicon region better identified microbial communities. The second and most relevant part of the study was based on the results obtained from the 16S rRNA gene amplicon regions analysis used as reference panel. We developed a 16S rRNA gene-based simulator to mimic a marker gene sequencing process, evaluating what might be the pitfalls that could be encountered by carrying out a targeted metagenomic analysis. We selected the full-length 16S rRNA gene sequences from the eHOMD database and we subdivided the sequences into different amplicon regions. Then, a taxonomic classification was carried out to assign a taxonomic rank to the simulated amplicons. Taxonomically unassigned bacterial species underwent to multiple sequence alignment combined with a phylogenetic tree construction against all full-length rRNA gene reference sequences of the eHOMD belonging to the possible Species solutions to investigate the sequence identity between them. Our data reported an improved 16S analysis run with DADA2 using the pooled marker gene amplicon regions in which we identified 204 different bacterial species. Overall, the analysis on the unique ASV counts of the pooled 16S rRNA gene reported that approximately 44% of total ASVs achieved the most detailed species-level classification. However, the amplicon region V2V3 proved to be the portion of the 16S rRNA that recognized the highest number of bacteria at the species taxonomic level (135) through the single amplicon analysis. The simulation process, involving the extraction of amplicon sequences from the full-length 16S rRNA gene sequences of simulated bacterial sequences, showed that not all the amplicons can be extracted, suggesting, several 16S rRNA gene amplicon sequences were not amplified, as often happens in the real world. Furthermore, several bacterial species appear to have a high degree of sequence similarity in a given region of the 16S rRNA gene, making classification incomplete in some cases. The use of the simulator will then be useful to understand the sensitivity of the different regions of the 16S rRNA gene in recognizing the presence of pathogenic bacterial species. Our study can then be adopted to build up an accurate catalog of oral ASVs to essay large sample sets (e.g., hundreds of individuals), aiming at improving the taxonomy classification performances.
2023
Oral microbiome, simulation, 16S rRNA gene, taxonomy
File in questo prodotto:
File Dimensione Formato  
Tesi_Dottorato_Locatelli_univr.pdf

Open Access dal 01/10/2024

Descrizione: Tesi di dottorato
Tipologia: Tesi di dottorato
Licenza: Non specificato
Dimensione 6.82 MB
Formato Adobe PDF
6.82 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1102946
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact