Dann, Christoph
35 publications
NeurIPSW
2024
Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning
NeurIPSW
2024
P3O: Pessimistic Preference-Based Policy Optimization for Robust Alignment from Preferences
NeurIPS
2021
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning