ML Anthology
Authors
Search
About
Hong, Joey
22 publications
ICML
2025
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai
,
Isadora White
,
Charlie Victor Snell
,
Charles Sun
,
Joey Hong
,
Yuexiang Zhai
,
Kelvin Xu
,
Sergey Levine
NeurIPS
2025
Planning Without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL
Joey Hong
,
Anca Dragan
,
Sergey Levine
ICLR
2025
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Joey Hong
,
Anca Dragan
,
Sergey Levine
ICLR
2024
ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis
Kensen Shi
,
Joey Hong
,
Yinlin Deng
,
Pengcheng Yin
,
Manzil Zaheer
,
Charles Sutton
ICML
2024
Learning to Explore in POMDPs with Informational Rewards
Annie Xie
,
Logan Mondal Bhamidipaty
,
Evan Zheran Liu
,
Joey Hong
,
Sergey Levine
,
Chelsea Finn
ICLR
2024
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
Joey Hong
,
Anca Dragan
,
Sergey Levine
ICLR
2023
Confidence-Conditioned Value Functions for Offline Reinforcement Learning
Joey Hong
,
Aviral Kumar
,
Sergey Levine
NeurIPS
2023
Learning to Influence Human Behavior with Offline Reinforcement Learning
Joey Hong
,
Sergey Levine
,
Anca Dragan
ICML
2023
Multi-Task Off-Policy Learning from Bandit Feedback
Joey Hong
,
Branislav Kveton
,
Manzil Zaheer
,
Sumeet Katariya
,
Mohammad Ghavamzadeh
ICLR
2023
On the Sensitivity of Reward Inference to Misspecified Human Models
Joey Hong
,
Kush Bhatia
,
Anca Dragan
NeurIPSW
2023
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Joey Hong
,
Sergey Levine
,
Anca Dragan
AISTATS
2022
Hierarchical Bayesian Bandits
Joey Hong
,
Branislav Kveton
,
Manzil Zaheer
,
Mohammad Ghavamzadeh
AISTATS
2022
Thompson Sampling with a Mixture Prior
Joey Hong
,
Branislav Kveton
,
Manzil Zaheer
,
Mohammad Ghavamzadeh
,
Craig Boutilier
ICLRW
2022
Compositional Generalization and Decomposition in Neural Program Synthesis
Kensen Shi
,
Joey Hong
,
Manzil Zaheer
,
Pengcheng Yin
,
Charles Sutton
NeurIPSW
2022
Confidence-Conditioned Value Functions for Offline Reinforcement Learning
Joey Hong
,
Aviral Kumar
,
Sergey Levine
NeurIPSW
2022
Confidence-Conditioned Value Functions for Offline Reinforcement Learning
Joey Hong
,
Aviral Kumar
,
Sergey Levine
ICML
2022
Deep Hierarchy in Bandits
Joey Hong
,
Branislav Kveton
,
Sumeet Katariya
,
Manzil Zaheer
,
Mohammad Ghavamzadeh
ICLR
2022
Should I Run Offline Reinforcement Learning or Behavioral Cloning?
Aviral Kumar
,
Joey Hong
,
Anikait Singh
,
Sergey Levine
AISTATS
2021
Non-Stationary Off-Policy Optimization
Joey Hong
,
Branislav Kveton
,
Manzil Zaheer
,
Yinlam Chow
,
Amr Ahmed
ICML
2021
Latent Programmer: Discrete Latent Codes for Program Synthesis
Joey Hong
,
David Dohan
,
Rishabh Singh
,
Charles Sutton
,
Manzil Zaheer
NeurIPSW
2021
Should I Run Offline Reinforcement Learning or Behavioral Cloning?
Aviral Kumar
,
Joey Hong
,
Anikait Singh
,
Sergey Levine
NeurIPS
2020
Latent Bandits Revisited
Joey Hong
,
Branislav Kveton
,
Manzil Zaheer
,
Yinlam Chow
,
Amr Ahmed
,
Craig Boutilier