We present Monte Carlo Tree Search with Tabular Model Learning (MCTS-TML), an extension of MCTS that does not require to know the transition model of the environment, since it learns/adapts the model while interacting with the environment. MCTS-TML assumes discrete states and actions, hence it uses a tabular representation of the transition model. The model update strategy is inspired by that of Dyna-Q but the sample efficiency of MCTS-TML is higher, therefore it requires less interactions with the environment to learn a good policy. Furthermore, MCTS-TML can scale to much larger state spaces (i.e., environments) since it computes the policy online, focusing only on the current state of the system, instead of on all possible states. We also show that MCTS-TML outperforms Q-learning, a popular model-free RL algorithm equivalent to Dyna-Q with no planning steps. Empirical evaluation of MCTS-TML is performed on both deterministic and stochastic environments showing that its sample efficiency is higher than that of Dyna-Q and Q-learning.
Tabular Model Learning in Monte Carlo Tree Search
Alberto Castellini;Alessandro Farinelli
2023-01-01
Abstract
We present Monte Carlo Tree Search with Tabular Model Learning (MCTS-TML), an extension of MCTS that does not require to know the transition model of the environment, since it learns/adapts the model while interacting with the environment. MCTS-TML assumes discrete states and actions, hence it uses a tabular representation of the transition model. The model update strategy is inspired by that of Dyna-Q but the sample efficiency of MCTS-TML is higher, therefore it requires less interactions with the environment to learn a good policy. Furthermore, MCTS-TML can scale to much larger state spaces (i.e., environments) since it computes the policy online, focusing only on the current state of the system, instead of on all possible states. We also show that MCTS-TML outperforms Q-learning, a popular model-free RL algorithm equivalent to Dyna-Q with no planning steps. Empirical evaluation of MCTS-TML is performed on both deterministic and stochastic environments showing that its sample efficiency is higher than that of Dyna-Q and Q-learning.File | Dimensione | Formato | |
---|---|---|---|
2023_IPS_Castellini_TabularModelLearning.pdf
accesso aperto
Licenza:
Dominio pubblico
Dimensione
647.25 kB
Formato
Adobe PDF
|
647.25 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.