Split computing, a recently developed paradigm, capitalizes on the computational resources of end devices to enhance the inference efficiency in machine learning (ML) applications. This approach involves the end device processing input data and transmitting intermediate results to a cloud server, which then completes the inference computation. While the main goals of split computing are to reduce latency, minimize energy consumption, and decrease data transfer overhead, minimizing data transmission time remains a challenge. Many existing strategies involve modifying the ML model architecture which ultimately requires resource-intensive retraining. In our work, we explore lossless and lossy techniques to encode intermediate results without modifying the ML model. Concentrating on image classification and object detection— two prevalent ML applications—we assess the advantages and limitations of each technique. Our findings indicate that simple tools, such as linear quantization and run-length encoding, already accomplish considerable information reduction, which is on par with more complex state-of-the-art techniques that necessitate model retraining. These tools are computationally efficient and do not burden the end device.

DNN Split Computing: Quantization and Run-Length Coding are Enough

Carra, Damiano;
2023-01-01

Abstract

Split computing, a recently developed paradigm, capitalizes on the computational resources of end devices to enhance the inference efficiency in machine learning (ML) applications. This approach involves the end device processing input data and transmitting intermediate results to a cloud server, which then completes the inference computation. While the main goals of split computing are to reduce latency, minimize energy consumption, and decrease data transfer overhead, minimizing data transmission time remains a challenge. Many existing strategies involve modifying the ML model architecture which ultimately requires resource-intensive retraining. In our work, we explore lossless and lossy techniques to encode intermediate results without modifying the ML model. Concentrating on image classification and object detection— two prevalent ML applications—we assess the advantages and limitations of each technique. Our findings indicate that simple tools, such as linear quantization and run-length encoding, already accomplish considerable information reduction, which is on par with more complex state-of-the-art techniques that necessitate model retraining. These tools are computationally efficient and do not burden the end device.
2023
split
ML model
File in questo prodotto:
File Dimensione Formato  
Carra_Globecom_23.pdf

solo utenti autorizzati

Tipologia: Documento in Pre-print
Licenza: Accesso ristretto
Dimensione 480.78 kB
Formato Adobe PDF
480.78 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1120931
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact