Evaluation of Human Action Quality with Linear Recurrent Units and Graph Attention Networks on Embedded Systems

Ziche, Filippo; Bombieri, Nicola

Recent evolutions of recurrent neural networks (RNN) such as S4, S4D, and LRU, have shown remarkable potential for very long-range sequence modeling tasks for vision, language, and audio. They have shown a capacity to capture dependencies over tens of thousands of steps. Unlike transformers, which face significant memory con- sumption challenges with large context sizes, they are a promising alternative with their ability to operate effectively on embedded systems. While they have been evaluated for classification and seg- mentation tasks, no work in the literature has applied them in the context of human pose estimation. In this work we propose an architecture that combines such state space models (SSM) to graph attention networks (GAT) to enable their application to evaluate human action tasks on embedded systems.