Probabilistically Robust Counterfactual Explanations under Model Changes

Marzari, Luca; Leofante, Francesco; Cicalese, Ferdinando; Farinelli, Alessandro

doi:10.1016/j.artint.2025.104459

We study the problem of generating robust counterfactual explanations for deep learning models subject to model changes. We focus on plausible model changes altering model parameters and propose a novel framework to reason about the robustness property in this setting. To motivate our solution, we begin by showing for the first time that computing the robustness of counterfactuals with respect to model changes is NP-hard. As this (practically) rules out the existence of scalable algorithms for exactly computing robustness, we propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees while preserving scalability. Remarkably, and differently from existing solutions targeting plausible model changes, our approach does not impose requirements on the network to be analysed, thus enabling robustness analysis on a wider range of architectures, including state-of-the-art tabular transformers. A thorough experimental analysis on four binary classification datasets reveals that our method improves the state of the art in generating robust explanations, outperforming existing methods.