The current socio-economic situation as well as international objectives set by the United Nation (2030 Sustainable Agenda) underline the urgency of low-cost and environmental-friendly computational alternatives. Moreover, in recent years the bioinformatic community has shown renewed interest for Raspberry Pi (RPi) application in teaching and research projects. In the context of the BioVRPi project - which aims to develop and offer a low-cost, stable and tested bioinformatic environment - we propose an exploratory cross-platform benchmarking of multi-organism genomic analyses. The benchmark of indexing and alignment processes was carried out on the following devices: RPi 4 (Raspberry Pi OS 04-04-2022) RAM 8GB HDD storage, laptop (MacOS Big Sur v11.2.3) Intel Core i5 2GHz quad-core processor RAM 16GB SSD, and desktop (Ubuntu 20.04.4 LTS) Intel Core i7 3GHz octa-core processor RAM 32GB HDD storage. Performance assessment was evaluated on SARS-CoV-2 virus, Escherichia coli and Caenorhabditis elegans genome sequences (respective RefSeq accessions: GCF_009858895.2, GCF_000005845.2, GCF_000002985.6) since they present different degrees of genomic complexity: virus, bacterium, and nematode. To minimize variability and possible biases due to sequencing technologies used, sample reads were generated in silico from their respective reference genomes using ART Illumina v2.5.8 with the following parameters: read length 150, paired end, coverage 30X, mean fragment length 200, standard deviation 10, HiSeqX v2.5 TruSeq built-in profile. Indexing and alignment were performed with 3 alignment tools: BWA v0.7.17-r1188, Bowtie2 v2.4.5, and Minimap2 v2.17, using default parameters and scaling from 1 up to 4 threads. Benchmarking was evaluated using Hyperfine v1.13.0 with a warmup step of 3 simulations and 10 runs for each process. We performed a cross-platform benchmark of multi-organism genomic indexing and short reads alignment to evaluate RPi as a viable alternative to common bioinformatic devices. To assess its performance, we tested some of the most widely used alignment tools on SARS-CoV-2, E. coli and C. elegans genomic data (respective genome sizes: 29.9Kbp, 4.6Mbp, 100.3Mbp). The computational times for indexing and alignment are reported in Table 1. As regards indexing, we observed comparable runtimes among RPi and other platforms using BWA and Bowtie2 for SARS-CoV-2 and E. coli, whereas Minimap2 indexing showed an increase of one order of magnitude in runtimes for RPi. Nonetheless, Minimap2 showed the fastest runtimes for indexing overall. In addition, we found an increase of one order of magnitude in RPi runtimes for C. elegans for all considered tools, even though differences in runtimes across platforms showed to be stable across organisms. As regards the alignment process, we observed consistency in runtimes differences across all organisms and tools. Overall, Minimap2 performances proved to be the fastest whereas Bowtie2 displayed poor performances across all platforms, exacerbating its inefficiency on RPi. Even though BWA seems to work more efficiently on RPi than on desktop for SARS-COV-2 data, desktop and laptop showed better performances on more complex organisms as expected. Benchmarking analyses considered multi-threading up to 4 threads, the maximum available on RPi. As regards indexing on Bowtie2, multi-threading proved to be effective for C. elegans data, showing no improvement in runtimes for SARS-CoV-2 and E. coli. Conversely, alignment showed the best performances using multi-threading as expected. In conclusion, RPi showed promising results, proved to be a viable low-cost and environmental-friendly alternative to perform genomic data analysis on different organisms and turned out to be particularly efficient for microorganisms. Further advances and tools optimization for RPi ARM architecture will lead to a greater scalability for complex organisms and will be carried out by the BioVRPi project in future exploratory analyses.
Towards pocket-sized genomic analyses: cross-platform benchmark of multi-organism genomic data indexing and alignment
Treccani M;Veschetti L;Malerba G
2022-01-01
Abstract
The current socio-economic situation as well as international objectives set by the United Nation (2030 Sustainable Agenda) underline the urgency of low-cost and environmental-friendly computational alternatives. Moreover, in recent years the bioinformatic community has shown renewed interest for Raspberry Pi (RPi) application in teaching and research projects. In the context of the BioVRPi project - which aims to develop and offer a low-cost, stable and tested bioinformatic environment - we propose an exploratory cross-platform benchmarking of multi-organism genomic analyses. The benchmark of indexing and alignment processes was carried out on the following devices: RPi 4 (Raspberry Pi OS 04-04-2022) RAM 8GB HDD storage, laptop (MacOS Big Sur v11.2.3) Intel Core i5 2GHz quad-core processor RAM 16GB SSD, and desktop (Ubuntu 20.04.4 LTS) Intel Core i7 3GHz octa-core processor RAM 32GB HDD storage. Performance assessment was evaluated on SARS-CoV-2 virus, Escherichia coli and Caenorhabditis elegans genome sequences (respective RefSeq accessions: GCF_009858895.2, GCF_000005845.2, GCF_000002985.6) since they present different degrees of genomic complexity: virus, bacterium, and nematode. To minimize variability and possible biases due to sequencing technologies used, sample reads were generated in silico from their respective reference genomes using ART Illumina v2.5.8 with the following parameters: read length 150, paired end, coverage 30X, mean fragment length 200, standard deviation 10, HiSeqX v2.5 TruSeq built-in profile. Indexing and alignment were performed with 3 alignment tools: BWA v0.7.17-r1188, Bowtie2 v2.4.5, and Minimap2 v2.17, using default parameters and scaling from 1 up to 4 threads. Benchmarking was evaluated using Hyperfine v1.13.0 with a warmup step of 3 simulations and 10 runs for each process. We performed a cross-platform benchmark of multi-organism genomic indexing and short reads alignment to evaluate RPi as a viable alternative to common bioinformatic devices. To assess its performance, we tested some of the most widely used alignment tools on SARS-CoV-2, E. coli and C. elegans genomic data (respective genome sizes: 29.9Kbp, 4.6Mbp, 100.3Mbp). The computational times for indexing and alignment are reported in Table 1. As regards indexing, we observed comparable runtimes among RPi and other platforms using BWA and Bowtie2 for SARS-CoV-2 and E. coli, whereas Minimap2 indexing showed an increase of one order of magnitude in runtimes for RPi. Nonetheless, Minimap2 showed the fastest runtimes for indexing overall. In addition, we found an increase of one order of magnitude in RPi runtimes for C. elegans for all considered tools, even though differences in runtimes across platforms showed to be stable across organisms. As regards the alignment process, we observed consistency in runtimes differences across all organisms and tools. Overall, Minimap2 performances proved to be the fastest whereas Bowtie2 displayed poor performances across all platforms, exacerbating its inefficiency on RPi. Even though BWA seems to work more efficiently on RPi than on desktop for SARS-COV-2 data, desktop and laptop showed better performances on more complex organisms as expected. Benchmarking analyses considered multi-threading up to 4 threads, the maximum available on RPi. As regards indexing on Bowtie2, multi-threading proved to be effective for C. elegans data, showing no improvement in runtimes for SARS-CoV-2 and E. coli. Conversely, alignment showed the best performances using multi-threading as expected. In conclusion, RPi showed promising results, proved to be a viable low-cost and environmental-friendly alternative to perform genomic data analysis on different organisms and turned out to be particularly efficient for microorganisms. Further advances and tools optimization for RPi ARM architecture will lead to a greater scalability for complex organisms and will be carried out by the BioVRPi project in future exploratory analyses.File | Dimensione | Formato | |
---|---|---|---|
20220629_BITS2022_TreccaniVeschettiMalerba_TowardsPocketSizedGenomicAnalyses_Presentazione_Poster.pdf
accesso aperto
Licenza:
Dominio pubblico
Dimensione
1.44 MB
Formato
Adobe PDF
|
1.44 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.