Surgical action recognition and temporal segmentation is a building block needed to provide some degrees of autonomy to surgical robots. In this paper, we present a deep learning model that relies on videos and kinematic data to output in real-time the current action in a surgical procedure. The proposed neural network architecture is composed of two sub-networks: a Spatial-Kinematic Network, which produces high-level features by processing images and kinematic data, and a Temporal Convolutional Network, which filters such features temporally over a sliding window to stabilize their changes over time. Since we are interested in applications to real-time supervisory control of robots, we focus on an efficient and causal implementation, i.e. the prediction at sample k only depends on previous observations. We tested our causal architecture on the publicly available JIGSAWS dataset, outperforming comparable state-of-the-art non-causal algorithms up to 8.6% in the edit score.

A Multi-Modal Learning System for On-Line Surgical Action Segmentation

Giacomo De Rossi;Serena Roin;Francesco Setti;Riccardo Muradore
2020-01-01

Abstract

Surgical action recognition and temporal segmentation is a building block needed to provide some degrees of autonomy to surgical robots. In this paper, we present a deep learning model that relies on videos and kinematic data to output in real-time the current action in a surgical procedure. The proposed neural network architecture is composed of two sub-networks: a Spatial-Kinematic Network, which produces high-level features by processing images and kinematic data, and a Temporal Convolutional Network, which filters such features temporally over a sliding window to stabilize their changes over time. Since we are interested in applications to real-time supervisory control of robots, we focus on an efficient and causal implementation, i.e. the prediction at sample k only depends on previous observations. We tested our causal architecture on the publicly available JIGSAWS dataset, outperforming comparable state-of-the-art non-causal algorithms up to 8.6% in the edit score.
2020
Multi-Modal Learning, Action Segmentation, Surgical robot
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1015035
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact