The success of an object classifier depends strongly on its training set, but this fact seems to be generally neglected in the computer vision community, which focuses primarily on the construction of descriptive features and the design of fast and effective learning mechanisms. Furthermore, collecting training sets is a very expensive step, which needs a considerable amount of manpower for selecting the most representative samples for an object class. In this paper, we face this problem, following the very recent trend of automatizing the collection of training images for image classification: in particular, here we exploit a source of information never considered so far for this purpose, that is the textual tags. Textual tags are usually attached by the crowd to the images of social platforms like Flickr, associating the visual content to explicit semantics, which unfortunately is noisy in many cases. Our approach leverages this shared knowledge, and collects images spanning the visual variance of an object class, removing at the same time the noise by different filtering query expansion techniques. Comparative results promote our method, which is capable to automatically generate in few minutes a training dataset leading to an 81.41% of average precision on the PASCAL VOC 2012 dataset.

Crowdsearching Training Sets for Image Classification

Abdulhak, Sami Abduljalil;Riviera, Walter;Cristani, Marco
2015-01-01

Abstract

The success of an object classifier depends strongly on its training set, but this fact seems to be generally neglected in the computer vision community, which focuses primarily on the construction of descriptive features and the design of fast and effective learning mechanisms. Furthermore, collecting training sets is a very expensive step, which needs a considerable amount of manpower for selecting the most representative samples for an object class. In this paper, we face this problem, following the very recent trend of automatizing the collection of training images for image classification: in particular, here we exploit a source of information never considered so far for this purpose, that is the textual tags. Textual tags are usually attached by the crowd to the images of social platforms like Flickr, associating the visual content to explicit semantics, which unfortunately is noisy in many cases. Our approach leverages this shared knowledge, and collects images spanning the visual variance of an object class, removing at the same time the noise by different filtering query expansion techniques. Comparative results promote our method, which is capable to automatically generate in few minutes a training dataset leading to an 81.41% of average precision on the PASCAL VOC 2012 dataset.
2015
978-3-319-23230-0
Classification, Computer vision, Image analysis, Image processing, Semantics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/971109
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact