Tran, Thien Q.

3 publications

ICLRW 2024 Initial Response Selection for Prompt Jailbreaking Using Model Steering Thien Q. Tran, Koki Wataoka, Tsubasa Takahashi
NeurIPS 2024 Stepwise Alignment for Constrained Language Model Policy Optimization Akifumi Wachi, Thien Q. Tran, Rei Sato, Takumi Tanabe, Youhei Akimoto
AAAI 2022 Unsupervised Causal Binary Concepts Discovery with VAE for Black-Box Model Explanation Thien Q. Tran, Kazuto Fukuchi, Youhei Akimoto, Jun Sakuma