Bucket Elimination (BE) is a framework that encompasses several algorithms, including Belief Propagation (BP) and variable elimination for Constraint Optimisation Problems (COPs). BE has significant computational requirements that can be addressed by using GPUs to parallelise its fundamental operations, i.e., composition and marginalisation, which operate on functions represented by large tables. We propose a novel approach to parallelise these operations with GPUs, which optimises the table layout so to achieve better performance in terms of increased speedup and scalability. Our approach allows us to process incomplete tables (i.e., tables with some missing variables assignments), which often occur in several practical applications (such as the ones we consider in our dataset). Finally, we can process tables that are larger than the GPU memory. Our approach outperforms the state-of-the-art technique to parallelise BP on GPUs, achieving better speedups (up to +466% w.r.t. such parallel technique). We test our method on a publicly available COP dataset, measuring a speedup up to 696.02x w.r.t. the sequential version. The ability of our technique to process large tables is crucial in this scenario, in which most of the instances generate tables larger than the GPU memory, and hence they cannot be solved with previous GPU techniques related to BE.
An Efficient Approach for Accelerating Bucket Elimination on GPUs
Bistaffa, Filippo;BOMBIERI, Nicola;FARINELLI, Alessandro
2017-01-01
Abstract
Bucket Elimination (BE) is a framework that encompasses several algorithms, including Belief Propagation (BP) and variable elimination for Constraint Optimisation Problems (COPs). BE has significant computational requirements that can be addressed by using GPUs to parallelise its fundamental operations, i.e., composition and marginalisation, which operate on functions represented by large tables. We propose a novel approach to parallelise these operations with GPUs, which optimises the table layout so to achieve better performance in terms of increased speedup and scalability. Our approach allows us to process incomplete tables (i.e., tables with some missing variables assignments), which often occur in several practical applications (such as the ones we consider in our dataset). Finally, we can process tables that are larger than the GPU memory. Our approach outperforms the state-of-the-art technique to parallelise BP on GPUs, achieving better speedups (up to +466% w.r.t. such parallel technique). We test our method on a publicly available COP dataset, measuring a speedup up to 696.02x w.r.t. the sequential version. The ability of our technique to process large tables is crucial in this scenario, in which most of the instances generate tables larger than the GPU memory, and hence they cannot be solved with previous GPU techniques related to BE.File | Dimensione | Formato | |
---|---|---|---|
ieeetran.pdf
accesso aperto
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
764.66 kB
Formato
Adobe PDF
|
764.66 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.