Learning and recognizing activity in videos is an especially important task in computer vision. However, it is hard to perform. In this paper, we propose a new method by combining local and global context information to extract a bag-of-wordslike representation of a single space-time point. Each spacetime point is described by a bag of visual words that encodes its relationships with the remaining space-time points in the video, defining the space-time context. Experiments on the KTH benchmark of action recognition, show that our approach performs accurately compared to the state-of-the-art.
An augmented representation of activitiy in video using semantic-context information
CASTELLANI, Umberto;
2014-01-01
Abstract
Learning and recognizing activity in videos is an especially important task in computer vision. However, it is hard to perform. In this paper, we propose a new method by combining local and global context information to extract a bag-of-wordslike representation of a single space-time point. Each spacetime point is described by a bag of visual words that encodes its relationships with the remaining space-time points in the video, defining the space-time context. Experiments on the KTH benchmark of action recognition, show that our approach performs accurately compared to the state-of-the-art.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.