Transfer Q-Learning with Composite MDP Structures
Abstract
To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. This relaxes the common assumption of purely low-rank transition models, allowing for more realistic scenarios where tasks share core dynamics but maintain individual variations. We introduce UCB-TQL (Upper Confidence Bound Transfer Q-Learning), designed for transfer RL scenarios where multiple tasks share core linear MDP dynamics but diverge along sparse dimensions. When applying UCB-TQL to a target task after training on a source task with sufficient trajectories, we achieve a regret bound of $\tilde{\mathcal{O}}(\sqrt{eH^5N})$ that scales independently of the ambient dimension. Here, $N$ represents the number of trajectories in the target task, while $e$ quantifies the sparse differences between tasks. This result demonstrates substantial improvement over single task RL by effectively leveraging their structural similarities. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations.
Cite
Text
Chai et al. "Transfer Q-Learning with Composite MDP Structures." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Chai et al. "Transfer Q-Learning with Composite MDP Structures." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/chai2025icml-transfer/)BibTeX
@inproceedings{chai2025icml-transfer,
title = {{Transfer Q-Learning with Composite MDP Structures}},
author = {Chai, Jinhang and Chen, Elynn and Yang, Lin},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {7089-7106},
volume = {267},
url = {https://mlanthology.org/icml/2025/chai2025icml-transfer/}
}