Hong, Joey

22 publications

ICML 2025 LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models Marwa Abdulhai, Isadora White, Charlie Victor Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine
NeurIPS 2025 Planning Without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL Joey Hong, Anca Dragan, Sergey Levine
ICLR 2025 Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning Joey Hong, Anca Dragan, Sergey Levine
ICLR 2024 ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis Kensen Shi, Joey Hong, Yinlin Deng, Pengcheng Yin, Manzil Zaheer, Charles Sutton
ICML 2024 Learning to Explore in POMDPs with Informational Rewards Annie Xie, Logan Mondal Bhamidipaty, Evan Zheran Liu, Joey Hong, Sergey Levine, Chelsea Finn
ICLR 2024 Offline RL with Observation Histories: Analyzing and Improving Sample Complexity Joey Hong, Anca Dragan, Sergey Levine
ICLR 2023 Confidence-Conditioned Value Functions for Offline Reinforcement Learning Joey Hong, Aviral Kumar, Sergey Levine
NeurIPS 2023 Learning to Influence Human Behavior with Offline Reinforcement Learning Joey Hong, Sergey Levine, Anca Dragan
ICML 2023 Multi-Task Off-Policy Learning from Bandit Feedback Joey Hong, Branislav Kveton, Manzil Zaheer, Sumeet Katariya, Mohammad Ghavamzadeh
ICLR 2023 On the Sensitivity of Reward Inference to Misspecified Human Models Joey Hong, Kush Bhatia, Anca Dragan
NeurIPSW 2023 Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations Joey Hong, Sergey Levine, Anca Dragan
AISTATS 2022 Hierarchical Bayesian Bandits Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh
AISTATS 2022 Thompson Sampling with a Mixture Prior Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier
ICLRW 2022 Compositional Generalization and Decomposition in Neural Program Synthesis Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton
NeurIPSW 2022 Confidence-Conditioned Value Functions for Offline Reinforcement Learning Joey Hong, Aviral Kumar, Sergey Levine
NeurIPSW 2022 Confidence-Conditioned Value Functions for Offline Reinforcement Learning Joey Hong, Aviral Kumar, Sergey Levine
ICML 2022 Deep Hierarchy in Bandits Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh
ICLR 2022 Should I Run Offline Reinforcement Learning or Behavioral Cloning? Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine
AISTATS 2021 Non-Stationary Off-Policy Optimization Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed
ICML 2021 Latent Programmer: Discrete Latent Codes for Program Synthesis Joey Hong, David Dohan, Rishabh Singh, Charles Sutton, Manzil Zaheer
NeurIPSW 2021 Should I Run Offline Reinforcement Learning or Behavioral Cloning? Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine
NeurIPS 2020 Latent Bandits Revisited Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Craig Boutilier