We present Monte Carlo Tree Search with Tabular Model Learning (MCTS-TML), an extension of MCTS that does not require to know the transition model of the environment, since it learns/adapts the model while interacting with the environment. MCTS-TML assumes discrete states and actions, hence it uses a tabular representation of the transition model. The model update strategy is inspired by that of Dyna-Q but the sample efficiency of MCTS-TML is higher, therefore it requires less interactions with the environment to learn a good policy. Furthermore, MCTS-TML can scale to much larger state spaces (i.e., environments) since it computes the policy online, focusing only on the current state of the system, instead of on all possible states. We also show that MCTS-TML outperforms Q-learning, a popular model-free RL algorithm equivalent to Dyna-Q with no planning steps. Empirical evaluation of MCTS-TML is performed on both deterministic and stochastic environments showing that its sample efficiency is higher than that of Dyna-Q and Q-learning.

Tabular Model Learning in Monte Carlo Tree Search

Alberto Castellini;Alessandro Farinelli
2023-01-01

Abstract

We present Monte Carlo Tree Search with Tabular Model Learning (MCTS-TML), an extension of MCTS that does not require to know the transition model of the environment, since it learns/adapts the model while interacting with the environment. MCTS-TML assumes discrete states and actions, hence it uses a tabular representation of the transition model. The model update strategy is inspired by that of Dyna-Q but the sample efficiency of MCTS-TML is higher, therefore it requires less interactions with the environment to learn a good policy. Furthermore, MCTS-TML can scale to much larger state spaces (i.e., environments) since it computes the policy online, focusing only on the current state of the system, instead of on all possible states. We also show that MCTS-TML outperforms Q-learning, a popular model-free RL algorithm equivalent to Dyna-Q with no planning steps. Empirical evaluation of MCTS-TML is performed on both deterministic and stochastic environments showing that its sample efficiency is higher than that of Dyna-Q and Q-learning.
2023
Monte Carlo Tree Search, Model Learning, Model-based reinforcement learning, Dyna-Q
File in questo prodotto:
File Dimensione Formato  
2023_IPS_Castellini_TabularModelLearning.pdf

accesso aperto

Licenza: Dominio pubblico
Dimensione 647.25 kB
Formato Adobe PDF
647.25 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1122075
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact