CATALOGO DEI PRODOTTI DELLA RICERCA

We tackle audio-visual inpainting, the problem of completing an im- age in such a way to be consistent with the sound associated to the scene. To this end, we propose a multimodal, audio-visual inpaint- ing method (AVIN), and show how to leverage sound to reconstruct semantically consistent images. AVIN is a 2-stage algorithm, which first learns the scene semantics and reconstructs low resolution im- ages based on a conditional probability distribution of pixels in the space conditioned to audio, and then refines such result with a GAN- based network to increase the resolution of the reconstructed image. We show that AVIN is able to recover the original content, especially in the hard cases where the missing area heavily degrades the scene semantics: it can perform cross-modal generation whenever no vi- sual context is observed at all, reconstructing visual data from sound only.

Audio-Visual Inpainting: Reconstructing Missing Visual Information with Sound

Sanguineti, Valentina;Thakur, Sanket;Morerio, Pietro;Del Bue, Alessio;Murino, Vittorio

2023-01-01

Abstract

We tackle audio-visual inpainting, the problem of completing an im- age in such a way to be consistent with the sound associated to the scene. To this end, we propose a multimodal, audio-visual inpaint- ing method (AVIN), and show how to leverage sound to reconstruct semantically consistent images. AVIN is a 2-stage algorithm, which first learns the scene semantics and reconstructs low resolution im- ages based on a conditional probability distribution of pixels in the space conditioned to audio, and then refines such result with a GAN- based network to increase the resolution of the reconstructed image. We show that AVIN is able to recover the original content, especially in the hard cases where the missing area heavily degrades the scene semantics: it can perform cross-modal generation whenever no vi- sual context is observed at all, reconstructing visual data from sound only.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Parole Chiave
	
				Multimodal learning; Inpainting; Audio-Visual learning
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1122756

Citazioni

ND

2

ND

social impact