Surgical action recognition and temporal segmentation is a building block needed to provide some degrees of autonomy to surgical robots. In this paper, we present a deep learning model that relies on videos and kinematic data to output in real-time the current action in a surgical procedure. The proposed neural network architecture is composed of two sub-networks: a Spatial-Kinematic Network, which produces high-level features by processing images and kinematic data, and a Temporal Convolutional Network, which filters such features temporally over a sliding window to stabilize their changes over time. Since we are interested in applications to real-time supervisory control of robots, we focus on an efficient and causal implementation, i.e. the prediction at sample k only depends on previous observations. We tested our causal architecture on the publicly available JIGSAWS dataset, outperforming comparable state-of-the-art non-causal algorithms up to 8.6% in the edit score.
|Titolo:||A Multi-Modal Learning System for On-Line Surgical Action Segmentation|
|Data di pubblicazione:||Being printed|
|Appare nelle tipologie:||04.01 Contributo in atti di convegno|