Surgical action recognition and temporal segmentation is a building block needed to provide some degrees of autonomy to surgical robots. In this paper, we present a deep learning model that relies on videos and kinematic data to output in real-time the current action in a surgical procedure. The proposed neural network architecture is composed of two sub-networks: a Spatial-Kinematic Network, which produces high-level features by processing images and kinematic data, and a Temporal Convolutional Network, which filters such features temporally over a sliding window to stabilize their changes over time. Since we are interested in applications to real-time supervisory control of robots, we focus on an efficient and causal implementation, i.e. the prediction at sample k only depends on previous observations. We tested our causal architecture on the publicly available JIGSAWS dataset, outperforming comparable state-of-the-art non-causal algorithms up to 8.6% in the edit score.
A Multi-Modal Learning System for On-Line Surgical Action Segmentation
Giacomo De Rossi;Serena Roin;Francesco Setti;Riccardo Muradore
2020-01-01
Abstract
Surgical action recognition and temporal segmentation is a building block needed to provide some degrees of autonomy to surgical robots. In this paper, we present a deep learning model that relies on videos and kinematic data to output in real-time the current action in a surgical procedure. The proposed neural network architecture is composed of two sub-networks: a Spatial-Kinematic Network, which produces high-level features by processing images and kinematic data, and a Temporal Convolutional Network, which filters such features temporally over a sliding window to stabilize their changes over time. Since we are interested in applications to real-time supervisory control of robots, we focus on an efficient and causal implementation, i.e. the prediction at sample k only depends on previous observations. We tested our causal architecture on the publicly available JIGSAWS dataset, outperforming comparable state-of-the-art non-causal algorithms up to 8.6% in the edit score.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.