In Computer Vision, automated pedestrian detection is surely one of the hottest topics, with important applications in surveillance and security. To this end, information integration from different imaging modalities, such as thermal infrared and visible spectrum, can significantly improve the detection rate with respect to monomodal strategies. A common scheme consists of extracting two sets of features, from thermal and visible images of the same scene respectively, and stacking them together into a single feature set, ignoring possible and meaningful inter-media dependencies. Here we propose a fusion scheme which acts at the feature-level, taking standard pixel characteristics (such as first/second order spatial derivatives or Local Binary Pattern) and designing a composite descriptor that, at the same time, encodes the information coming from the separate modalities, as well as the cross-modal mutual relationships in the form of covariances. The descriptor, which lies on a Riemannian manifold, is projected onto a Euclidean tangent space and then fed into a Support Vector Machine classifier. Experiments performed on the OTCBVS dataset [1], and validated statistically, demonstrate that our method outperforms significantly the single modality policies as well as different fusion schemes at the pixel, feature and decision level.

Low-level multimodal integration on Riemannian manifolds for automatic pedestrian detection

CRISTANI, Marco;MARTELLI, Samuele;MURINO, Vittorio
2012

Abstract

In Computer Vision, automated pedestrian detection is surely one of the hottest topics, with important applications in surveillance and security. To this end, information integration from different imaging modalities, such as thermal infrared and visible spectrum, can significantly improve the detection rate with respect to monomodal strategies. A common scheme consists of extracting two sets of features, from thermal and visible images of the same scene respectively, and stacking them together into a single feature set, ignoring possible and meaningful inter-media dependencies. Here we propose a fusion scheme which acts at the feature-level, taking standard pixel characteristics (such as first/second order spatial derivatives or Local Binary Pattern) and designing a composite descriptor that, at the same time, encodes the information coming from the separate modalities, as well as the cross-modal mutual relationships in the form of covariances. The descriptor, which lies on a Riemannian manifold, is projected onto a Euclidean tangent space and then fed into a Support Vector Machine classifier. Experiments performed on the OTCBVS dataset [1], and validated statistically, demonstrate that our method outperforms significantly the single modality policies as well as different fusion schemes at the pixel, feature and decision level.
9781467304177
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11562/471962
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact