Computer vision applications have stringent performance constraints that must be satisfied when they are run at the edge on programmable low-power embedded devices. OpenVX has emerged as the de-facto reference standard to develop such applications. OpenVX uses a primitive-based programming model that results in a directed-acyclic graph (DAG) representation of the application, which can then be used for automatic system-level optimizations and synthesis to heterogeneous multi- and many-core platforms. Although OpenVX has been standardized, its state-of-the-art algorithm for task mapping and scheduling does not deliver the performance necessary for such applications to be deployed on heterogeneous multi-/many-core platforms. Our work focuses on addressing this challenge with three main contributions: First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70% on one of the most widespread smart systems for applying computer vision and intelligent video analytics in general at the edge (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of a vision application for edge computing where some primitives may have multiple implementations (e.g., for CPU and GPU), can lead to load imbalance amongst heterogeneous computing elements (CEs), thus suffering from degraded performance. Third, we present an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33% over HEFT, and 82% over the native OpenVX scheduler. We present the results on a large set of benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with an NVIDIA inference application based on convolutional neural networks (CNNs) for object detection.
Task Mapping and Scheduling for OpenVX Applications on Heterogeneous Multi/Many-core Architectures
Francesco Lumpp;Stefano Aldegheri;Nicola Bombieri
2021-01-01
Abstract
Computer vision applications have stringent performance constraints that must be satisfied when they are run at the edge on programmable low-power embedded devices. OpenVX has emerged as the de-facto reference standard to develop such applications. OpenVX uses a primitive-based programming model that results in a directed-acyclic graph (DAG) representation of the application, which can then be used for automatic system-level optimizations and synthesis to heterogeneous multi- and many-core platforms. Although OpenVX has been standardized, its state-of-the-art algorithm for task mapping and scheduling does not deliver the performance necessary for such applications to be deployed on heterogeneous multi-/many-core platforms. Our work focuses on addressing this challenge with three main contributions: First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70% on one of the most widespread smart systems for applying computer vision and intelligent video analytics in general at the edge (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of a vision application for edge computing where some primitives may have multiple implementations (e.g., for CPU and GPU), can lead to load imbalance amongst heterogeneous computing elements (CEs), thus suffering from degraded performance. Third, we present an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33% over HEFT, and 82% over the native OpenVX scheduler. We present the results on a large set of benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with an NVIDIA inference application based on convolutional neural networks (CNNs) for object detection.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.