CATALOGO DEI PRODOTTI DELLA RICERCA

Computer vision applications have stringent performance constraints that must be satisfied when they are run at the edge on programmable low-power embedded devices. OpenVX has emerged as the de-facto reference standard to develop such applications. OpenVX uses a primitive-based programming model that results in a directed-acyclic graph (DAG) representation of the application, which can then be used for automatic system-level optimizations and synthesis to heterogeneous multi- and many-core platforms. Although OpenVX has been standardized, its state-of-the-art algorithm for task mapping and scheduling does not deliver the performance necessary for such applications to be deployed on heterogeneous multi-/many-core platforms. Our work focuses on addressing this challenge with three main contributions: First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70% on one of the most widespread smart systems for applying computer vision and intelligent video analytics in general at the edge (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of a vision application for edge computing where some primitives may have multiple implementations (e.g., for CPU and GPU), can lead to load imbalance amongst heterogeneous computing elements (CEs), thus suffering from degraded performance. Third, we present an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33% over HEFT, and 82% over the native OpenVX scheduler. We present the results on a large set of benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with an NVIDIA inference application based on convolutional neural networks (CNNs) for object detection.

Task Mapping and Scheduling for OpenVX Applications on Heterogeneous Multi/Many-core Architectures

Francesco Lumpp;Stefano Aldegheri;Hiren Patel;Nicola Bombieri

2021-01-01

Abstract

Computer vision applications have stringent performance constraints that must be satisfied when they are run at the edge on programmable low-power embedded devices. OpenVX has emerged as the de-facto reference standard to develop such applications. OpenVX uses a primitive-based programming model that results in a directed-acyclic graph (DAG) representation of the application, which can then be used for automatic system-level optimizations and synthesis to heterogeneous multi- and many-core platforms. Although OpenVX has been standardized, its state-of-the-art algorithm for task mapping and scheduling does not deliver the performance necessary for such applications to be deployed on heterogeneous multi-/many-core platforms. Our work focuses on addressing this challenge with three main contributions: First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70% on one of the most widespread smart systems for applying computer vision and intelligent video analytics in general at the edge (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of a vision application for edge computing where some primitives may have multiple implementations (e.g., for CPU and GPU), can lead to load imbalance amongst heterogeneous computing elements (CEs), thus suffering from degraded performance. Third, we present an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33% over HEFT, and 82% over the native OpenVX scheduler. We present the results on a large set of benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with an NVIDIA inference application based on convolutional neural networks (CNNs) for object detection.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Parole chiave
	
				OpenVX
static mapping and scheduling
			
	Parole chiave
	
				Embedded vision applications
			
	Appare nelle tipologie:
	
				01.01 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
1_TCOMP21.pdf accesso aperto Licenza: Copyright dell'editore Dimensione 3.59 MB Formato Adobe PDF Visualizza/Apri	3.59 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1037406

Citazioni

ND

9

7

social impact