Learning to Model What Matters – Representations and World Models for Efficient Reinforcement Learning

Salvato in:
Dettagli Bibliografici
Pubblicato in:ProQuest Dissertations and Theses (2025)
Autore principale: Voelcker, Claas Alexander
Pubblicazione:
ProQuest Dissertations & Theses
Soggetti:
Accesso online:Citation/Abstract
Full Text - PDF
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC

LEADER 00000nab a2200000uu 4500
001 3276209807
003 UK-CbPIL
020 |a 9798265437457 
035 |a 3276209807 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Voelcker, Claas Alexander 
245 1 |a Learning to Model What Matters – Representations and World Models for Efficient Reinforcement Learning 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Reinforcement learning (RL) is one of the most promising pathway towards building decision-making systems that can learn from their own successes and mistakes. However, despite their potential, RL agents often struggle to learn complex tasks, proving too inefficient, both in terms of samples and computational resources, and unstable in practice. To enable RL-based agents to live up to their potential, we need to address these limitation.To this end, we take a close look as the mechanisms that lead to unstable and inefficient value function learning with neural networks. Learned value functions overestimate true returns during training, and this overestimation is linked to unstable learning in the feature representation layers of neural networks. To counteract this, we show the need for proper normalization of learned value approximations. Building on this insight, we then investigate model-based auxiliary tasks to stabilize feature learning further. We find that model-based self-prediction, in combination with value learning, leads to stable features.Moving beyond feature learning, we investigate decision-aware model learning. We find that, similar to the issues encountered in representation learning, tying model updates to the value function can lead to unstable and even diverging model learning. This problem can be mitigated in observation-space models by using the value function gradient to measure its sensitivity with regard to model errors. We then move on to combine our insights into representation learning and model learning. We discuss the family of value-aware model learning algorithms and show how to extend its losses to account for learning with stochastic models. Finally, we show that combining all previous insights in a unified architecture can lead to stable and efficient value function learning. 
653 |a Computer science 
653 |a Artificial intelligence 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3276209807/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3276209807/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch