CATALOGO DEI PRODOTTI DELLA RICERCA

Retrieving clothes which are worn in social media videos (Instagram, TikTok) is the latest frontier of e-fashion, referred to as "video-to-shop" in the computer vision literature. In this paper we present MovingFashion, the first publicly available dataset to cope with this challenge. MovingFashion is composed of 14855 social videos, each one of them associated to e-commerce "shop" images where the corresponding clothing items are clearly portrayed. In addition, we present a network for retrieving the shop images in this scenario, dubbed SEAM Match-RCNN. The model is trained by image-to-video domain adaptation, allowing to use video sequences where only their association with a shop image is given, eliminating the need of millions of annotated bounding boxes. SEAM Match-RCNN builds an embedding, where an attention-based weighted sum of few frames (10) of a social video is enough to individuate the correct product within the first 5 retrieved items in a 14K+ shop element gallery with an accuracy of 80%. This provides the best performance on MovingFashion, comparing exhaustively against the related state-of-the-art approaches and alternative baselines.

MovingFashion: a Benchmark for the Video-to-Shop Challenge

Godi Marco;Joppi Christian;Skenderi Geri;Cristani Marco

2022-01-01

Abstract

Retrieving clothes which are worn in social media videos (Instagram, TikTok) is the latest frontier of e-fashion, referred to as "video-to-shop" in the computer vision literature. In this paper we present MovingFashion, the first publicly available dataset to cope with this challenge. MovingFashion is composed of 14855 social videos, each one of them associated to e-commerce "shop" images where the corresponding clothing items are clearly portrayed. In addition, we present a network for retrieving the shop images in this scenario, dubbed SEAM Match-RCNN. The model is trained by image-to-video domain adaptation, allowing to use video sequences where only their association with a shop image is given, eliminating the need of millions of annotated bounding boxes. SEAM Match-RCNN builds an embedding, where an attention-based weighted sum of few frames (10) of a social video is enough to individuate the correct product within the first 5 retrieved items in a 14K+ shop element gallery with an accuracy of 80%. This provides the best performance on MovingFashion, comparing exhaustively against the related state-of-the-art approaches and alternative baselines.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Parole Chiave
	
				video-to-shop, image retrieval, fashion
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Godi_MovingFashion_A_Benchmark_for_the_Video-To-Shop_Challenge_WACV_2022_paper.pdf solo utenti autorizzati Tipologia: Documento in Post-print Licenza: Accesso ristretto Dimensione 2.32 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.32 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1055777

Citazioni

ND

ND

ND

social impact