We study the problem of generating robust counterfactual explanations for deep learning models subject to model changes. We focus on plausible model changes altering model parameters and propose a novel framework to reason about the robustness property in this setting. To motivate our solution, we begin by showing for the first time that computing the robustness of counterfactuals with respect to model changes is NP-hard. As this (practically) rules out the existence of scalable algorithms for exactly computing robustness, we propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees while preserving scalability. Remarkably, and differently from existing solutions targeting plausible model changes, our approach does not impose requirements on the network to be analysed, thus enabling robustness analysis on a wider range of architectures, including state-of-the-art tabular transformers. A thorough experimental analysis on four binary classification datasets reveals that our method improves the state of the art in generating robust explanations, outperforming existing methods.

Probabilistically Robust Counterfactual Explanations under Model Changes

Luca Marzari;Francesco Leofante;Ferdinando Cicalese;Alessandro Farinelli
2025-01-01

Abstract

We study the problem of generating robust counterfactual explanations for deep learning models subject to model changes. We focus on plausible model changes altering model parameters and propose a novel framework to reason about the robustness property in this setting. To motivate our solution, we begin by showing for the first time that computing the robustness of counterfactuals with respect to model changes is NP-hard. As this (practically) rules out the existence of scalable algorithms for exactly computing robustness, we propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees while preserving scalability. Remarkably, and differently from existing solutions targeting plausible model changes, our approach does not impose requirements on the network to be analysed, thus enabling robustness analysis on a wider range of architectures, including state-of-the-art tabular transformers. A thorough experimental analysis on four binary classification datasets reveals that our method improves the state of the art in generating robust explanations, outperforming existing methods.
2025
Algorithmic recourse, Counterfactual explanations, Explainable AI, Robustness of explanations
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S000437022500178X-main.pdf

solo utenti autorizzati

Licenza: Non specificato
Dimensione 7.03 MB
Formato Adobe PDF
7.03 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1181128
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact