Implementing a graph traversal (GT) algorithm for GPUs is a very challenging task. It is a core primitive for many graph analysis applications and its efficiency strongly impacts on the overall application performance. Different strategies have been proposed to implement the GT algorithm by exploiting the GPU characteristics. Nevertheless, the efficiency of each of them strongly depends on the graph characteristics. This paper presents an analysis of the most important features of the parallel GT algorithm, which include frontier queue management, load balancing, duplicate removing, and synchronization during graph traversal iterations. It shows different techniques to implement each of such features for GPUs and the comparison of their performance when applied on a very large and heterogeneous set of graphs. The results allow identifying, for each feature and among different implementation techniques of them, the best configuration to address the graph characteristics. The paper finally presents how such a configuration analysis and set allow traversing graphs with throughput up to 14,000 MTEPS on single GPU devices, with speedups ranging from 1.2x to 18.5x with regard to the best parallel applications for GT on GPUs at the state of the art.
|Titolo:||Configuring Graph Traversal Applications for GPUs: Analysis of Implementation Strategies and their Correlation with Graph Characteristics|
BOMBIERI, Nicola (Corresponding)
|Data di pubblicazione:||2019|
|Appare nelle tipologie:||04.01 Contributo in atti di convegno|