Learning and recognizing activity in videos is an especially important task in computer vision. However, it is hard to perform. In this paper, we propose a new method by combining local and global context information to extract a bag-of-wordslike representation of a single space-time point. Each spacetime point is described by a bag of visual words that encodes its relationships with the remaining space-time points in the video, defining the space-time context. Experiments on the KTH benchmark of action recognition, show that our approach performs accurately compared to the state-of-the-art.

An augmented representation of activitiy in video using semantic-context information

CASTELLANI, Umberto;
2014-01-01

Abstract

Learning and recognizing activity in videos is an especially important task in computer vision. However, it is hard to perform. In this paper, we propose a new method by combining local and global context information to extract a bag-of-wordslike representation of a single space-time point. Each spacetime point is described by a bag of visual words that encodes its relationships with the remaining space-time points in the video, defining the space-time context. Experiments on the KTH benchmark of action recognition, show that our approach performs accurately compared to the state-of-the-art.
2014
Action recognition; semantic shape context; SVM classification
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/763761
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact