Identifiable Latent Bandits: Combining Observational Data and Exploration for Personalized Healthcare
Abstract
Bandit algorithms hold great promise for improving personalized decision-making but are notoriously sample-hungry. In most health applications, it is infeasible to fit a new bandit for each patient, and observable variables are often insufficient to determine optimal treatments, ruling out applying contextual bandits learned from multiple patients. Latent bandits offer both rapid exploration and personalization beyond what context variables can reveal but require that a latent variable model can be learned consistently. In this work, we propose bandit algorithms based on nonlinear independent component analysis that can be provably identified from observational data to a degree sufficient to infer the optimal action in a new bandit instance consistently. We verify this strategy in simulated data, showing substantial improvement over learning independent multi-armed bandits for every instance.
Cite
Text
Balcıoğlu et al. "Identifiable Latent Bandits: Combining Observational Data and Exploration for Personalized Healthcare." ICML 2024 Workshops: RLControlTheory, 2024.Markdown
[Balcıoğlu et al. "Identifiable Latent Bandits: Combining Observational Data and Exploration for Personalized Healthcare." ICML 2024 Workshops: RLControlTheory, 2024.](https://mlanthology.org/icmlw/2024/balcoglu2024icmlw-identifiable/)BibTeX
@inproceedings{balcoglu2024icmlw-identifiable,
title = {{Identifiable Latent Bandits: Combining Observational Data and Exploration for Personalized Healthcare}},
author = {Balcıoğlu, Ahmet Zahid and Carlsson, Emil and Johansson, Fredrik D.},
booktitle = {ICML 2024 Workshops: RLControlTheory},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/balcoglu2024icmlw-identifiable/}
}