Social communication involves interpreting nonverbal behaviors, detecting and anticipating others’ actions and intentions. Actions convey not only the goal and motor intention but also the form, i.e., variations in action execution. These variations, termed vitality forms, communicate attitudes during interactions, such as being gentle, calm, vigorous, and rude. Automatic vitality form recognition may have several applications in social robotics, social skills training, and therapy, yet it remains a rarely studied topic. This paper introduces an unsupervised pre-training approach that utilizes 2D-body key point trajectories as input and employs diffusion models to derive more effective features for representing these trajectories. The features learned from the diffusion model’s encoder are utilized to train a multilayer perceptron for vitality form recognition. Experimental analysis showcases the superior performance of the proposed method not only across various videos but also for action classes not encountered during training.

Diffusion-Based Unsupervised Pre-training for Automated Recognition of Vitality Forms

Cigdem Beyan
2024-01-01

Abstract

Social communication involves interpreting nonverbal behaviors, detecting and anticipating others’ actions and intentions. Actions convey not only the goal and motor intention but also the form, i.e., variations in action execution. These variations, termed vitality forms, communicate attitudes during interactions, such as being gentle, calm, vigorous, and rude. Automatic vitality form recognition may have several applications in social robotics, social skills training, and therapy, yet it remains a rarely studied topic. This paper introduces an unsupervised pre-training approach that utilizes 2D-body key point trajectories as input and employs diffusion models to derive more effective features for representing these trajectories. The features learned from the diffusion model’s encoder are utilized to train a multilayer perceptron for vitality form recognition. Experimental analysis showcases the superior performance of the proposed method not only across various videos but also for action classes not encountered during training.
2024
Vitality forms, nonverbal communication, unsupervised pre-training, diffusion models, autoencoders, gestures, actions, trajectory
File in questo prodotto:
File Dimensione Formato  
IC30_Diffusion Based Unsupervised Pretraining.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 1.99 MB
Formato Adobe PDF
1.99 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1125909
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact