CATALOGO DEI PRODOTTI DELLA RICERCA

Safe Policy Improvement (SPI) is crucial in domains where reliable decision-making must be achieved with limited environmental interaction, given the high costs and risks involved. Although existing SPI algorithms ensure improved safety over baseline policies, they struggle to scale to large and complex problems. In this work, we discuss new approaches to enhance the scalability and safety of SPI for both single-agent and multi-agent systems. For single-agent scenarios, we introduce MCTS-SPIBB, which combines Monte Carlo Tree Search with Safe Policy Improvement with Baseline Bootstrapping, and SDP-SPIBB, a scalable dynamic programming approach that extends SPI to large domains while preserving safety guarantees. For multi-agent settings, we present Factored Value-MCTS-SPIBB, the first SPI method to address large-scale multi-agent problems effectively. Through theoretical and empirical evaluation, we show that our algorithms scale efficiently and maintain the safety properties of SPI, thus making SPI applicable to complex and large-scale scenarios.

Scalable Safe Policy Improvement for Single and Multi-Agent Systems

Federico Bianchi;Alberto Castellini;Alessandro Farinelli

2025-01-01

Abstract

Safe Policy Improvement (SPI) is crucial in domains where reliable decision-making must be achieved with limited environmental interaction, given the high costs and risks involved. Although existing SPI algorithms ensure improved safety over baseline policies, they struggle to scale to large and complex problems. In this work, we discuss new approaches to enhance the scalability and safety of SPI for both single-agent and multi-agent systems. For single-agent scenarios, we introduce MCTS-SPIBB, which combines Monte Carlo Tree Search with Safe Policy Improvement with Baseline Bootstrapping, and SDP-SPIBB, a scalable dynamic programming approach that extends SPI to large domains while preserving safety guarantees. For multi-agent settings, we present Factored Value-MCTS-SPIBB, the first SPI method to address large-scale multi-agent problems effectively. Through theoretical and empirical evaluation, we show that our algorithms scale efficiently and maintain the safety properties of SPI, thus making SPI applicable to complex and large-scale scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole Chiave
	
				Safe policy improvement
Reinforcement Learning
Single-agent systems
Multi-agent systems
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2025_AIRO2025_ScalableSPI.pdf accesso aperto Descrizione: Articolo Tipologia: Versione dell'editore Licenza: Dominio pubblico Dimensione 300.7 kB Formato Adobe PDF Visualizza/Apri	300.7 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1186988

Citazioni

ND

1

ND

social impact