Enhancing Question Generation Through Diversity-Seeking Reinforcement Learning with Bilevel Policy Decomposition
Abstract
Recent advancements in question generation (QG) have been significantly propelled by reinforcement learning (RL). Although extensive reward models have been designed to capture the attributes of ideal questions, their associated learning challenges, particularly in sample efficiency and diversity, remain underexplored. This paper introduces a bilevel policy decomposition (BPD) framework and a diversity-seeking RL (DSRL) objective to address these issues. The BPD framework utilizes two cascading policies to divide QG into two more manageable sub-tasks: answer-centric summary generation and summary-augmented QG, facilitating exploration and accelerating policy learning. Concurrently, the DSRL objective preserves the inherent diversity of QG by ensuring the bilevel policies align probabilistically with their reward models rather than merely maximizing returns. Our integrated approach, named BPD-DSRL, demonstrates superior performance over existing baselines on multiple question quality and diversity metrics across various QG benchmarks.
Cite
Text
Ren et al. "Enhancing Question Generation Through Diversity-Seeking Reinforcement Learning with Bilevel Policy Decomposition." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I23.34693Markdown
[Ren et al. "Enhancing Question Generation Through Diversity-Seeking Reinforcement Learning with Bilevel Policy Decomposition." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/ren2025aaai-enhancing/) doi:10.1609/AAAI.V39I23.34693BibTeX
@inproceedings{ren2025aaai-enhancing,
title = {{Enhancing Question Generation Through Diversity-Seeking Reinforcement Learning with Bilevel Policy Decomposition}},
author = {Ren, Tianyu and Wang, Hui and Rafferty, Karen},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {25083-25091},
doi = {10.1609/AAAI.V39I23.34693},
url = {https://mlanthology.org/aaai/2025/ren2025aaai-enhancing/}
}