In the last decade, the introduction of the Next Generation Sequencing (NGS) technologies dramatically increased the number of genome sequencing projects because sequencing is now faster and cheaper: more than 19 000 prokaryotic projects, including probiotics, have been submitted to the NCBI database up to December 2013. Although NGS has become widely feasible, also as outsourcing service accessible for SME, the production of sequencing data is just the first step toward a comprehensive approach for safety assessment and evidence-based health claims for probiotics as required by regulatory authorities. Thanks to remarkable advances in the genomics of prokaryotes, many authors are developing new strategies, as an alternative to phenotypic assays, e.g. High-throughput Multilocus Sequence Typing (HiMLST) and identification of acquired resistance genes through comparative genomics tools (Boers et al., 2012; Zankari et al., 2012). Despite these potentialities some critical issues can mislead or limit the exploitation of the NGS technologies as an alternative to phenotypic characterization. The first critical issue lies in the fact that the genome sequences are obtained through different NGS technologies and different protocols for the preparation of the DNA libraries for sequencing. Moreover they can be released and submitted to the data bank in different assembly status. Concerning differences between NGS technologies the big challenge stems from the sequence errors and the length of the reads, as sequence errors can lead to wrong genotyping and genome annotation, while short read length is problematic with the presence of repetitive sequences. During assembly of raw sequences, mis-joins of genomic regions may occur through the repeat. Repeats exist in almost all genomes, and they are often associated with acquired genetic elements. Mis-assembly can be revealed by Whole Genome Mapping (Onmus-Leone et al., 2013), which is also foreseen as the next Gold Standard for bacterial strain authentication (Miller 2013). Regarding comparative genomics, the main issue derives from i) the incorrect identification of strains and ii) the presence of error in annotation and in comparison. As for the first point, the use of correct nomenclature for bacterial species is part of fundamental language that allows microbiologists to communicate with each other and across other disciplines. Despite this, deposition of genome sequences assigned to a wrong systematic nomenclature is still common. This is the case of some species of Lactobacillus genus, which includes the majority of bacteria used in food and many probiotics. Based on this data, a systematic review of the way genome sequence data is deposited is highly encouraged in order to characterize new bacterial genome sequences prior their announcement. As for the second point, in silico analysis showed that all bacterial genomes contained a low percentage of open reading frame with undetected frameshifts and in-frame stop codons which can be really present in the organisms or result from sequencing errors. The introduction of errors at the first stage of genome sequencing and gene prediction in fact can lead to errors in annotation and comparison. In order to bypass most issues related with the sequence errors, the Duplex Consensus Sequence (DCS) approach, which consists in two different sequencing projects, has been recently proposed (Kirsch & Klein, 2012). In conclusion to obtain accurate genomic data and to get more information a multi-methodological approach is encouraged, in order to validate data with different technologies and whole genome optical mapping is a suitable solution to validate assembly.
Probiotics R&D: from genome sequences to regulatory issues
DEL CASALE, Antonio;SALVETTI, Elisa;FELIS, Giovanna;TORRIANI, Sandra;FRACCHETTI, Fabio
2014-01-01
Abstract
In the last decade, the introduction of the Next Generation Sequencing (NGS) technologies dramatically increased the number of genome sequencing projects because sequencing is now faster and cheaper: more than 19 000 prokaryotic projects, including probiotics, have been submitted to the NCBI database up to December 2013. Although NGS has become widely feasible, also as outsourcing service accessible for SME, the production of sequencing data is just the first step toward a comprehensive approach for safety assessment and evidence-based health claims for probiotics as required by regulatory authorities. Thanks to remarkable advances in the genomics of prokaryotes, many authors are developing new strategies, as an alternative to phenotypic assays, e.g. High-throughput Multilocus Sequence Typing (HiMLST) and identification of acquired resistance genes through comparative genomics tools (Boers et al., 2012; Zankari et al., 2012). Despite these potentialities some critical issues can mislead or limit the exploitation of the NGS technologies as an alternative to phenotypic characterization. The first critical issue lies in the fact that the genome sequences are obtained through different NGS technologies and different protocols for the preparation of the DNA libraries for sequencing. Moreover they can be released and submitted to the data bank in different assembly status. Concerning differences between NGS technologies the big challenge stems from the sequence errors and the length of the reads, as sequence errors can lead to wrong genotyping and genome annotation, while short read length is problematic with the presence of repetitive sequences. During assembly of raw sequences, mis-joins of genomic regions may occur through the repeat. Repeats exist in almost all genomes, and they are often associated with acquired genetic elements. Mis-assembly can be revealed by Whole Genome Mapping (Onmus-Leone et al., 2013), which is also foreseen as the next Gold Standard for bacterial strain authentication (Miller 2013). Regarding comparative genomics, the main issue derives from i) the incorrect identification of strains and ii) the presence of error in annotation and in comparison. As for the first point, the use of correct nomenclature for bacterial species is part of fundamental language that allows microbiologists to communicate with each other and across other disciplines. Despite this, deposition of genome sequences assigned to a wrong systematic nomenclature is still common. This is the case of some species of Lactobacillus genus, which includes the majority of bacteria used in food and many probiotics. Based on this data, a systematic review of the way genome sequence data is deposited is highly encouraged in order to characterize new bacterial genome sequences prior their announcement. As for the second point, in silico analysis showed that all bacterial genomes contained a low percentage of open reading frame with undetected frameshifts and in-frame stop codons which can be really present in the organisms or result from sequencing errors. The introduction of errors at the first stage of genome sequencing and gene prediction in fact can lead to errors in annotation and comparison. In order to bypass most issues related with the sequence errors, the Duplex Consensus Sequence (DCS) approach, which consists in two different sequencing projects, has been recently proposed (Kirsch & Klein, 2012). In conclusion to obtain accurate genomic data and to get more information a multi-methodological approach is encouraged, in order to validate data with different technologies and whole genome optical mapping is a suitable solution to validate assembly.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.