Learning to Model What Matters – Representations and World Models for Efficient Reinforcement Learning

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Voelcker, Claas Alexander
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:Reinforcement learning (RL) is one of the most promising pathway towards building decision-making systems that can learn from their own successes and mistakes. However, despite their potential, RL agents often struggle to learn complex tasks, proving too inefficient, both in terms of samples and computational resources, and unstable in practice. To enable RL-based agents to live up to their potential, we need to address these limitation.To this end, we take a close look as the mechanisms that lead to unstable and inefficient value function learning with neural networks. Learned value functions overestimate true returns during training, and this overestimation is linked to unstable learning in the feature representation layers of neural networks. To counteract this, we show the need for proper normalization of learned value approximations. Building on this insight, we then investigate model-based auxiliary tasks to stabilize feature learning further. We find that model-based self-prediction, in combination with value learning, leads to stable features.Moving beyond feature learning, we investigate decision-aware model learning. We find that, similar to the issues encountered in representation learning, tying model updates to the value function can lead to unstable and even diverging model learning. This problem can be mitigated in observation-space models by using the value function gradient to measure its sensitivity with regard to model errors. We then move on to combine our insights into representation learning and model learning. We discuss the family of value-aware model learning algorithms and show how to extend its losses to account for learning with stochastic models. Finally, we show that combining all previous insights in a unified architecture can lead to stable and efficient value function learning.
ISBN:9798265437457
Fuente:ProQuest Dissertations & Theses Global