Object detection traditionally requires sliding-window classifier in modern deep learning based ap- proaches. However, both of these approaches requires tedious configurations in bounding boxes. Gen- erally speaking, single-class object detection is to tell where the object is, and how big it is. Traditional methods combine the ”where”and ”how”subproblems into a single one through the overall judgement of various scales of bounding boxes. In view of this, we are interesting in whether the ”where”and ”how”subproblems can be separated into two independent subtasks to ease the problem definition and the dif- ficulty of training. Accordingly, we provide a new perspective where detecting objects is approached as a high-level semantic feature detection task. Like edges, corners, blobs and other feature detectors, the pro- posed detector scans for feature points all over the image, for which the convolution is naturally suited. However, unlike these traditional low-level features, the proposed detector goes for a higher-level ab- straction, that is, we are looking for central points where there are objects, and modern deep models are already capable of such a high-level semantic abstraction. Like blob detection, we also predict the scales of the central points, which is also a straightforward convolution. Therefore, in this paper, pedestrian and face detection is simplified as a straightforward center and scale prediction task through convolu- tions. This way, the proposed method enjoys an anchor-free setting, considerably reducing the difficulty in training configuration and hyper-parameter optimization. Though structurally simple, it presents com- petitive accuracy on several challenging benchmarks, including pedestrian detection and face detection. Furthermore, a cross-dataset evaluation is performed, demonstrating a superior generalization ability of the proposed method.

Center and Scale Prediction: Anchor-free Approach for Pedestrian and Face Detection

Hasan Irtiza
;
2023-01-01

Abstract

Object detection traditionally requires sliding-window classifier in modern deep learning based ap- proaches. However, both of these approaches requires tedious configurations in bounding boxes. Gen- erally speaking, single-class object detection is to tell where the object is, and how big it is. Traditional methods combine the ”where”and ”how”subproblems into a single one through the overall judgement of various scales of bounding boxes. In view of this, we are interesting in whether the ”where”and ”how”subproblems can be separated into two independent subtasks to ease the problem definition and the dif- ficulty of training. Accordingly, we provide a new perspective where detecting objects is approached as a high-level semantic feature detection task. Like edges, corners, blobs and other feature detectors, the pro- posed detector scans for feature points all over the image, for which the convolution is naturally suited. However, unlike these traditional low-level features, the proposed detector goes for a higher-level ab- straction, that is, we are looking for central points where there are objects, and modern deep models are already capable of such a high-level semantic abstraction. Like blob detection, we also predict the scales of the central points, which is also a straightforward convolution. Therefore, in this paper, pedestrian and face detection is simplified as a straightforward center and scale prediction task through convolu- tions. This way, the proposed method enjoys an anchor-free setting, considerably reducing the difficulty in training configuration and hyper-parameter optimization. Though structurally simple, it presents com- petitive accuracy on several challenging benchmarks, including pedestrian detection and face detection. Furthermore, a cross-dataset evaluation is performed, demonstrating a superior generalization ability of the proposed method.
2023
Object Detection, Convolutional Neural Networks, Feature Detection, anchor-free
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0031320322005519-main.pdf

accesso aperto

Licenza: Creative commons
Dimensione 3.08 MB
Formato Adobe PDF
3.08 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1155347
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 33
  • ???jsp.display-item.citation.isi??? 25
social impact