CATALOGO DEI PRODOTTI DELLA RICERCA

Embedded vision applications have stringent performance constraints that must be satisfied when they are run on low-power embedded systems. OpenVX has emerged as the de-facto reference standard to develop such applications. Starting with a DAG representation of the application and by relying on a primitive-based programming model, it allows for automatic system-level optimizations and synthesis of an implementation onto the target heterogeneous multi-core architecture. However, the state-of-the-art algorithm for task mapping and scheduling in OpenVX does not provide the performance necessary for such applications when deployed on embedded multi-/many-core architectures. %does not implement an efficient algorithm task mapping and scheduling onto embedded multi/many-core architectures. Our work addresses this challenge by making the following three contributions. First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70% on one of the most widespread embedded vision systems (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of an embedded vision application where some primitives may have multiple implementations (e.g., for CPU and for GPU), can lead to an imbalance in load amongst heterogeneous computing elements (CEs); thereby, suffering from degraded performance. Third, we propose an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33% over HEFT, and 82% over OpenVX. We present the results on different benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with the NVIDIA image recognition application based on deep-learning.

On the Task Mapping and Scheduling for DAG-based Embedded Vision Applications on Heterogeneous Multi/Many-core Architectures

Stefano Aldegheri;Nicola Bombieri;PATEL, HIREN DHANJI

2020-01-01

Abstract

Embedded vision applications have stringent performance constraints that must be satisfied when they are run on low-power embedded systems. OpenVX has emerged as the de-facto reference standard to develop such applications. Starting with a DAG representation of the application and by relying on a primitive-based programming model, it allows for automatic system-level optimizations and synthesis of an implementation onto the target heterogeneous multi-core architecture. However, the state-of-the-art algorithm for task mapping and scheduling in OpenVX does not provide the performance necessary for such applications when deployed on embedded multi-/many-core architectures. %does not implement an efficient algorithm task mapping and scheduling onto embedded multi/many-core architectures. Our work addresses this challenge by making the following three contributions. First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70% on one of the most widespread embedded vision systems (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of an embedded vision application where some primitives may have multiple implementations (e.g., for CPU and for GPU), can lead to an imbalance in load amongst heterogeneous computing elements (CEs); thereby, suffering from degraded performance. Third, we propose an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33% over HEFT, and 82% over OpenVX. We present the results on different benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with the NVIDIA image recognition application based on deep-learning.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Parole Chiave
	
				Embedded vision applications
Static mapping and scheduling
OpenVX
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
01_main.pdf accesso aperto Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 213.28 kB Formato Adobe PDF Visualizza/Apri	213.28 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1003226

Citazioni

ND

6

4

social impact