There is a growing interest in adopting 3D human pose estimation in safety-critical systems, from healthcare to Industry 5.0. Nevertheless, when applied in such settings, these neural networks may suffer from estimation inaccuracy. Besides imprecise or inconsistent annotations in the training dataset, the inaccuracy is caused by poor image quality, rare poses, dropped frames, or heavy occlusions in the scene. In addition, these scenarios often require the software results to have temporal constraints, such as real-time and zero- or low-latency, which make many of the filtering solutions proposed in the literature inapplicable. This paper proposes FLK, a Filter with Learned Kinematics, to refine 3D human motion data in real-time and at zero/low latency. The temporal core combines a Kalman filter and a low-pass filter, which learns the motion model through a recurrent neural network. The spatial core takes advantage of the biomechanical constraints of the human body to provide spatial coherency between keypoints. The combination of the cores allows the filter to adequately address different types of noise, from jittering to dropped frames. We test the filter on motion data from multiple datasets and seven 3D human pose estimation backbones, improving accuracy up to 140 mm with non-Gaussian noise and 53 mm with missing information.

FLK: A filter with learned kinematics for real-time 3D human pose estimation

Enrico Martini
;
Michele Boldo;Nicola Bombieri
2024-01-01

Abstract

There is a growing interest in adopting 3D human pose estimation in safety-critical systems, from healthcare to Industry 5.0. Nevertheless, when applied in such settings, these neural networks may suffer from estimation inaccuracy. Besides imprecise or inconsistent annotations in the training dataset, the inaccuracy is caused by poor image quality, rare poses, dropped frames, or heavy occlusions in the scene. In addition, these scenarios often require the software results to have temporal constraints, such as real-time and zero- or low-latency, which make many of the filtering solutions proposed in the literature inapplicable. This paper proposes FLK, a Filter with Learned Kinematics, to refine 3D human motion data in real-time and at zero/low latency. The temporal core combines a Kalman filter and a low-pass filter, which learns the motion model through a recurrent neural network. The spatial core takes advantage of the biomechanical constraints of the human body to provide spatial coherency between keypoints. The combination of the cores allows the filter to adequately address different types of noise, from jittering to dropped frames. We test the filter on motion data from multiple datasets and seven 3D human pose estimation backbones, improving accuracy up to 140 mm with non-Gaussian noise and 53 mm with missing information.
2024
Human motion refinement
Human pose estimation
Filtering
Denoising
Completion
Kalman filter
File in questo prodotto:
File Dimensione Formato  
2024_FLK.pdf

accesso aperto

Licenza: Dominio pubblico
Dimensione 1.16 MB
Formato Adobe PDF
1.16 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1131906
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact