ML Anthology
Authors
Search
About
Tran, Thien Q.
3 publications
ICLRW
2024
Initial Response Selection for Prompt Jailbreaking Using Model Steering
Thien Q. Tran
,
Koki Wataoka
,
Tsubasa Takahashi
NeurIPS
2024
Stepwise Alignment for Constrained Language Model Policy Optimization
Akifumi Wachi
,
Thien Q. Tran
,
Rei Sato
,
Takumi Tanabe
,
Youhei Akimoto
AAAI
2022
Unsupervised Causal Binary Concepts Discovery with VAE for Black-Box Model Explanation
Thien Q. Tran
,
Kazuto Fukuchi
,
Youhei Akimoto
,
Jun Sakuma