CATALOGO DEI PRODOTTI DELLA RICERCA

Active objects are those in contact with the first person in an egocentric video. This paper addresses the challenge of anticipating the future location of the next active object in relation to a person within a given egocentric video clip, which is challenging since the contact is poised to happen after the last observed frame by the model, even before any action takes place. As we aim to estimate the position of objects, this problem is particularly hard in a scenario where the observed clip and the action segment are separated by the so-called time-to-contact segment. We term this task Anticipating the Next ACTive Object (ANACTO) and introduce a transformer-based self-attention framework to tackle it. We compare our model with the existing anticipation-based methods to establish relevant baseline methods, where our approach outperforms all of them on three major egocentric datasets: EpicKitchens-100, EGTEA+, and Ego4D. We also conduct an ablation study to better present the effectiveness of the proposed and baseline methods on varying conditions. The code as well as the ANACTO task annotations for the aforementioned first two datasets will be made available upon the acceptance of this paper.

Anticipating Next Active Objects for Egocentric Videos

Sanket Kumar Thakur;Cigdem Beyan;Pietro Morerio;Vittorio Murino;Alessio del Bue

2024-01-01

Abstract

Active objects are those in contact with the first person in an egocentric video. This paper addresses the challenge of anticipating the future location of the next active object in relation to a person within a given egocentric video clip, which is challenging since the contact is poised to happen after the last observed frame by the model, even before any action takes place. As we aim to estimate the position of objects, this problem is particularly hard in a scenario where the observed clip and the action segment are separated by the so-called time-to-contact segment. We term this task Anticipating the Next ACTive Object (ANACTO) and introduce a transformer-based self-attention framework to tackle it. We compare our model with the existing anticipation-based methods to establish relevant baseline methods, where our approach outperforms all of them on three major egocentric datasets: EpicKitchens-100, EGTEA+, and Ego4D. We also conduct an ablation study to better present the effectiveness of the proposed and baseline methods on varying conditions. The code as well as the ANACTO task annotations for the aforementioned first two datasets will be made available upon the acceptance of this paper.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Parole chiave
	
				Videos
Task analysis
Predictive models
Transformers
Detectors
Object recognition
Image analysis
Information analysis
Active perception
Human activity recognition
Egocentric vision
anticipation
next active object
active object
scene understanding
			
	Parole chiave
	
				Egocentric vision, anticipation, next active object, active object, scene understanding
			
	Appare nelle tipologie:
	
				01.01 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
IJ23_Anticipating_Next_Active_Objects_for_Egocentric_Videos.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 19.78 MB Formato Adobe PDF Visualizza/Apri	19.78 MB	Adobe PDF	Visualizza/Apri
Anticipating_Next_Active_Objects_for_Egocentric_Videos.pdf accesso aperto Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 19.78 MB Formato Adobe PDF Visualizza/Apri	19.78 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1132647

Citazioni

ND

3

1

social impact