In the object recognition community, much effort has been spent on devising expressive object representations and powerful learning strategies for designing effective classifiers, capable of achieving high accuracy and generalization. In this scenario, the focus on the training sets has been historically weak; by and large, training sets have been generated with a substantial human intervention, requiring considerable time. In this paper, we present a strategy for automatic training set generation. The strategy uses semantic knowledge coming from WordNet, coupled with the statistical power provided by Google Ngram, to select a set of meaningful text strings related to the text class-label (e.g., “cat”), that are subsequently fed into the Google Images search engine, producing sets of images with high training value. Focusing on the classes of different object recognition benchmarks (PASCAL VOC 2012, Caltech-256, ImageNet, GRAZ and OxfordPet), our approach collects novel training images, compared to the ones obtained by exploiting Google Images with the simple text class-label. In particular, we show that the gathered images are better able to capture the different visual facets of a concept, thus encoding in a more successful manner the intra-class variance. As a consequence, training standard classifiers with this data produces performances not too distant from those obtained from the classical hand-crafted training sets. In addition, our datasets generalize well and are stable, that is, they provide similar performances on diverse test datasets. This process does not require manual intervention and is completed in a few hours.

Semantically-driven automatic creation of training sets for object recognition

CHENG, Dong Seon;SETTI, FRANCESCO;CRISTANI, Marco
2015-01-01

Abstract

In the object recognition community, much effort has been spent on devising expressive object representations and powerful learning strategies for designing effective classifiers, capable of achieving high accuracy and generalization. In this scenario, the focus on the training sets has been historically weak; by and large, training sets have been generated with a substantial human intervention, requiring considerable time. In this paper, we present a strategy for automatic training set generation. The strategy uses semantic knowledge coming from WordNet, coupled with the statistical power provided by Google Ngram, to select a set of meaningful text strings related to the text class-label (e.g., “cat”), that are subsequently fed into the Google Images search engine, producing sets of images with high training value. Focusing on the classes of different object recognition benchmarks (PASCAL VOC 2012, Caltech-256, ImageNet, GRAZ and OxfordPet), our approach collects novel training images, compared to the ones obtained by exploiting Google Images with the simple text class-label. In particular, we show that the gathered images are better able to capture the different visual facets of a concept, thus encoding in a more successful manner the intra-class variance. As a consequence, training standard classifiers with this data produces performances not too distant from those obtained from the classical hand-crafted training sets. In addition, our datasets generalize well and are stable, that is, they provide similar performances on diverse test datasets. This process does not require manual intervention and is completed in a few hours.
2015
Object recognition; Training dataset; Semantics; WordNet; Internet search
File in questo prodotto:
File Dimensione Formato  
SematicTrainer_v_0.19.pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Dominio pubblico
Dimensione 2.88 MB
Formato Adobe PDF
2.88 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/932902
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 14
social impact