In deep learning, visualization techniques extract the salient patterns exploited by deep networks to perform a task (e.g. image classification) focusing on single images. These methods allow a better understanding of these complex models, empowering the identification of the most informative parts of the input data. Beyond the deep network understanding, visual saliency is useful for many quantitative reasons and applications, both in the 2D and 3D domains, such as the analysis of the generalization capabilities of a classifier and autonomous navigation. In this thesis, we describe an approach to cope with the interpretability problem of a convolutional neural network and propose our ideas on how to exploit the visualization for applications like image classification and active object recognition. After a brief overview on common visualization methods producing attention/saliency maps, we will address two separate points: firstly, we will describe how visual saliency can be effectively used in the 2D domain (e.g. RGB images) to boost image classification performances: as a matter of fact, visual summaries, i.e. a compact representation of an ensemble of saliency maps, can be used to improve the classification accuracy of a network through summary-driven specializations. Then, we will present a 3D active recognition system that allows to consider different views of a target object, overcoming the single-view hypothesis of classical object recognition, making the classification problem much easier in principle. Here we adopt such attention maps in a quantitative fashion, by building a 3D dense saliency volume which fuses together saliency maps obtained from different viewpoints, obtaining a continuous proxy on which parts of an object are more discriminative for a given classifier. Finally, we will show how to inject this representations in a real world application, so that an agent (e.g. robot) can move knowing the capabilities of its classifier.

Saliency-based approaches for multidimensional explainability of deep networks

Marco Carletti
2020-01-01

Abstract

In deep learning, visualization techniques extract the salient patterns exploited by deep networks to perform a task (e.g. image classification) focusing on single images. These methods allow a better understanding of these complex models, empowering the identification of the most informative parts of the input data. Beyond the deep network understanding, visual saliency is useful for many quantitative reasons and applications, both in the 2D and 3D domains, such as the analysis of the generalization capabilities of a classifier and autonomous navigation. In this thesis, we describe an approach to cope with the interpretability problem of a convolutional neural network and propose our ideas on how to exploit the visualization for applications like image classification and active object recognition. After a brief overview on common visualization methods producing attention/saliency maps, we will address two separate points: firstly, we will describe how visual saliency can be effectively used in the 2D domain (e.g. RGB images) to boost image classification performances: as a matter of fact, visual summaries, i.e. a compact representation of an ensemble of saliency maps, can be used to improve the classification accuracy of a network through summary-driven specializations. Then, we will present a 3D active recognition system that allows to consider different views of a target object, overcoming the single-view hypothesis of classical object recognition, making the classification problem much easier in principle. Here we adopt such attention maps in a quantitative fashion, by building a 3D dense saliency volume which fuses together saliency maps obtained from different viewpoints, obtaining a continuous proxy on which parts of an object are more discriminative for a given classifier. Finally, we will show how to inject this representations in a real world application, so that an agent (e.g. robot) can move knowing the capabilities of its classifier.
2020
deep learning, saliency, visual summaries, saliency volumes, image classification, autonomous navigation
File in questo prodotto:
File Dimensione Formato  
phd_thesis_carletti_marco_univr_20200422.pdf

Open Access dal 09/05/2020

Descrizione: Doctoral thesis
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 10.13 MB
Formato Adobe PDF
10.13 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1016496
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact