CATALOGO DEI PRODOTTI DELLA RICERCA

We propose a model-based reinforcement learning method using Monte Carlo Tree Search planning. The approach assumes a black-box approximated model of the environment developed by an expert using any kind of modeling framework and it improves the model as new information from the environment is collected. This is crucial in real-world applications, since having a complete knowledge of complex environments is impractical. The expert’s model is first translated into a neural network and then it is updated periodically using data, i.e., state-action-next-state triplets, collected from the real environment. We propose three different methods to integrate data acquired from the environment with prior knowledge provided by the expert and we evaluate our approach on a domain concerning air quality and thermal comfort control in smart buildings. We compare the three proposed versions with standard Monte Carlo Tree Search planning using the expert’s model (without adaptation), Proximal Policy Optimization (a popular model-free DRL approach) and Stochastic Lower Bounds Optimization (a popular model-based DRL approach). Results show that our approach achieves the best results, outperforming all analyzed competitors.

Online model adaptation in Monte Carlo tree search planning

M. Zuccotto;E. Fusa;A. Castellini;A. Farinelli

2024-01-01

Abstract

We propose a model-based reinforcement learning method using Monte Carlo Tree Search planning. The approach assumes a black-box approximated model of the environment developed by an expert using any kind of modeling framework and it improves the model as new information from the environment is collected. This is crucial in real-world applications, since having a complete knowledge of complex environments is impractical. The expert’s model is first translated into a neural network and then it is updated periodically using data, i.e., state-action-next-state triplets, collected from the real environment. We propose three different methods to integrate data acquired from the environment with prior knowledge provided by the expert and we evaluate our approach on a domain concerning air quality and thermal comfort control in smart buildings. We compare the three proposed versions with standard Monte Carlo Tree Search planning using the expert’s model (without adaptation), Proximal Policy Optimization (a popular model-free DRL approach) and Stochastic Lower Bounds Optimization (a popular model-based DRL approach). Results show that our approach achieves the best results, outperforming all analyzed competitors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Parole chiave
	
				Model based reinforcement learning , Learning dynamics models , Monte Carlo tree search , Planning and learning , Adaptive systems
			
	Appare nelle tipologie:
	
				01.01 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
2024_OptimLearning_Zuccotto_OnlineModelAdaptation.pdf accesso aperto Descrizione: Paper Tipologia: Versione dell'editore Licenza: Dominio pubblico Dimensione 1.26 MB Formato Adobe PDF Visualizza/Apri	1.26 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1131206

Citazioni

ND

0

0

social impact