Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes

Abstract

This paper introduces a framework that incorporates fine-tuning large language models (LLM) with reinforcement learning (RL) in the application of the dynamic treatment regime (DTR). Within the RL training framework, our bilevel-LLM framework makes use of indications from the DTR environment for `RL with Environment Feedback' (RLEF) fine-tuning to achieve best-of-both-world results. Experimental results show that LLM-RLEF agent outperforms both existing RL policies and pure LLM policies on the \emph{SimGlucoseEnv} treatment regime task, improving sampling efficiency, generalizability, and interpretability. In addition to improving DTR performance, RLEF improves LLM's question-answering ability on the MMLU-Med, MedQA, and MedMCQA benchmarks.

Cite

Text

Liu et al. "Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes." NeurIPS 2024 Workshops: AIM-FM, 2024.

Markdown

[Liu et al. "Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes." NeurIPS 2024 Workshops: AIM-FM, 2024.](https://mlanthology.org/neuripsw/2024/liu2024neuripsw-best/)

BibTeX

@inproceedings{liu2024neuripsw-best,
  title     = {{Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes}},
  author    = {Liu, Hongxuan and Luo, Zhiyao and Zhu, Tingting},
  booktitle = {NeurIPS 2024 Workshops: AIM-FM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/liu2024neuripsw-best/}
}