16S rRNA gene amplicons and taxonomic classification of oral microbiome

Locatelli, Elena; Luciano, Umberto; De Santis, Daniele; Patuzzo, Cristina; Malerba, Giovanni

The term microbiota refers to a set of microorganisms, considered as a living ecosystem, undergoing continuous changes in the growth and survival of all its members. The microbiome consists of the set of microorganism genomes. The human microbiota is estimated to contain about 10^14 commensal bacterial cells. The present high-throughput sequencing technology has led to the development of genome-based methods for bacterial classification and for understanding the functional role of the microbiota and its interaction with the host. In this study we explore the capability of a gene-based sequencing method to classify bacteria of the oral microbiome, the second largest microbial community in the human body, after the gut. The method depends on the detection of sequence variants in the bacterial 16S rRNA gene (length ~1500bp), present in all bacterial genomes. This gene includes nine hypervariable regions (V1-V9) that exhibit sequence diversity among different bacterial species. Therefore, the sequence variability of this gene is used to classify bacteria into proper taxonomic groups. The sequencing of one single hypervariable region cannot summarize the entire gene variability of the bacteria. Therefore, at least 2 hypervariable regions are generally studied. In gut studies the V3 and V4 regions are the most commonly analyzed. This could not be the case for oral microbiome studies. Here, we propose a study that investigates all the 9 hypervariable regions (6 amplicons) and how their characterization impacts on the overall taxa classification, at different taxonomic layers. This will permit to show up also the specificity of each hypervariable region (or their combination) to identify bacterial species. We collected 4 buccal swab samples from healthy individuals, and the extracted DNA was sequenced according to the QIAseq 16S/ITS panel handbook on an Illumina MiSeq NGS platform producing ~200,000 paired end reads (276PE) per sample. We carried out the study in two different ways: 1) by combining data from all amplicons of the 16S regions together, 2) by combining data from each amplicon region that was processed individually, in each sample. Amplicon analyses were performed using the Divisive Amplicon Denoising Algorithm (DADA2) that counts the number of amplicon sequence variants (ASVs) in each analyzed sample, reporting their abundance. ASVs were then classified using a pre-trained set for oral bacterial genome sequences (Human Oral Microbiome Database, version 15.1), slightly modified according to DADA2 requirements. The classification efficiency and accuracy (at genus or species layer) of every ASVs belonging to the different hypervariable regions was then ascertained. This analysis highlights the hypervariable regions able to capture the greatest gene variability for oral microbiome. Moreover, the ten most common species of each of the 6 amplicons, were reported for comparison purposes. We identified about 90 genera and more than 200 species; out of 9 identified phyla, Proteobacteria resulted to be the most abundant phylum (~ 56%). Of all the 2600 unique observed ASVs (4 samples), 1147 were successfully classified at the species taxonomic layer (overall classification rate: 44.1%). Overall, 204 different species were recognized with the entire set of combined amplicons, whereas 206 different species were identified by the combined results of single amplicons. The V1-V2 and V2-V3 amplicons recognized the highest number of species compared to the others, about 134 and 135 different species, respectively, of which 101 species in common. All the single regions showed almost the same ten most recurrent species. Moreover, each region resulted to be able to detect specific bacterial species that were not detectable by the other 16S regions. In conclusion, studying all the 9 16S gene regions is ~1.7 times more informative than studying just either one or 2 regions, and some species can be recognized only when studying specific regions. Still it remains doubtful how to treat data from different regions together to estimate the relative abundances of bacterial species within each sample.