Partially Observable Monte Carlo Planning (POMCP) is a powerful online algorithm that can generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. However, the lack of an explicit policy representation hinders interpretability and a proper evaluation of the risks an agent may incur. In this work, we propose a methodology based on Maximum Satisfiability Modulo Theory (MAX-SMT) for analyzing POMCP policies by inspecting their traces, namely, sequences of belief- action pairs generated by the algorithm. The proposed method explores local properties of the policy to build a compact and informative summary of the policy behaviour. Moreover, we introduce a rich and formal language that a domain expert can use to describe the expected behaviour of a policy. In more detail, we present a formulation that directly computes the risk involved in taking actions by considering the high- level elements specified by the expert. The final formula can identify risky decisions taken by POMCP that violate the expert indications. We show that this identification process can be used offline (to improve the policy’s explainability and identify anomalous behaviours) or online (to shield the risky decisions of the POMCP algorithm). We present an extended evaluation of our approach on four domains: the well-known tiger and rocksample benchmarks, a problem of velocity regulation in mobile robots, and a problem of battery management in mobile robots. We test the methodology against a state-of- the-art anomaly detection algorithm to show that our approach can be used to identify anomalous behaviours in faulty POMCP. We also show, comparing the performance of shielded and unshielded POMCP, that the shielding mechanism can improve the system’s performance. We provide an open-source implementation of the proposed methodologies at https://github.com/GiuMaz/XPOMCP.

Risk-aware shielding of Partially Observable Monte Carlo Planning policies

Giulio Mazzi;Alberto Castellini;Alessandro Farinelli
2023-01-01

Abstract

Partially Observable Monte Carlo Planning (POMCP) is a powerful online algorithm that can generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. However, the lack of an explicit policy representation hinders interpretability and a proper evaluation of the risks an agent may incur. In this work, we propose a methodology based on Maximum Satisfiability Modulo Theory (MAX-SMT) for analyzing POMCP policies by inspecting their traces, namely, sequences of belief- action pairs generated by the algorithm. The proposed method explores local properties of the policy to build a compact and informative summary of the policy behaviour. Moreover, we introduce a rich and formal language that a domain expert can use to describe the expected behaviour of a policy. In more detail, we present a formulation that directly computes the risk involved in taking actions by considering the high- level elements specified by the expert. The final formula can identify risky decisions taken by POMCP that violate the expert indications. We show that this identification process can be used offline (to improve the policy’s explainability and identify anomalous behaviours) or online (to shield the risky decisions of the POMCP algorithm). We present an extended evaluation of our approach on four domains: the well-known tiger and rocksample benchmarks, a problem of velocity regulation in mobile robots, and a problem of battery management in mobile robots. We test the methodology against a state-of- the-art anomaly detection algorithm to show that our approach can be used to identify anomalous behaviours in faulty POMCP. We also show, comparing the performance of shielded and unshielded POMCP, that the shielding mechanism can improve the system’s performance. We provide an open-source implementation of the proposed methodologies at https://github.com/GiuMaz/XPOMCP.
2023
POMDP, POMCP, SMT, Risk-awareness, Shielding
File in questo prodotto:
File Dimensione Formato  
2023_AIJ_RiskAware.pdf

accesso aperto

Licenza: Creative commons
Dimensione 1.38 MB
Formato Adobe PDF
1.38 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1113667
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact