CATALOGO DEI PRODOTTI DELLA RICERCA

Many modern programmable embedded devices contain CPUs and a GPU that share the same system memory on a single die. Such a unified memory architecture allows the explicit data copying between CPU and integrated GPU (iGPU) to be eliminated with the benefit of significantly improving performance and energy savings. However, to enable such a "zero-copy" communication model, many devices either implement intricate cache coherence protocols or they may disable the last level caches. This often leads to strong performance degradation of cache-dependent applications, for which CPU-iGPU data transfer based on standard copy remains the best solution. This paper presents a framework based on a performance model, a set of micro-benchmarks, and a novel zero-copy communication pattern to accurately estimate the potential speedup a CPU-iGPU application may have by considering different communication models (i.e., standard copy, unified memory, or pinned "zero-copy"). It shows how the framework can be combined with standard profiler information to efficiently drive the application tuning for a given programmable embedded device.

A Framework for Optimizing CPU-iGPUCommunication on Embedded Platforms

F. Lumpp;Hiren Patel;Nicola Bombieri^{Membro del Collaboration Group}

2021-01-01

Abstract

Many modern programmable embedded devices contain CPUs and a GPU that share the same system memory on a single die. Such a unified memory architecture allows the explicit data copying between CPU and integrated GPU (iGPU) to be eliminated with the benefit of significantly improving performance and energy savings. However, to enable such a "zero-copy" communication model, many devices either implement intricate cache coherence protocols or they may disable the last level caches. This often leads to strong performance degradation of cache-dependent applications, for which CPU-iGPU data transfer based on standard copy remains the best solution. This paper presents a framework based on a performance model, a set of micro-benchmarks, and a novel zero-copy communication pattern to accurately estimate the potential speedup a CPU-iGPU application may have by considering different communication models (i.e., standard copy, unified memory, or pinned "zero-copy"). It shows how the framework can be combined with standard profiler information to efficiently drive the application tuning for a given programmable embedded device.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Parole Chiave
	
				I/O  cache  coherence
			
	Parole Chiave
	
				CPU-GPU   communication,
Edge computing
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1037404

Citazioni

ND

6

5

social impact