ML Anthology
Authors
Search
About
Zhi, Gong
1 publications
ICLR
2025
Super(ficial)-Alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang
,
Shiqi Shen
,
Guangyao Shen
,
Wei Yao
,
Yong Liu
,
Gong Zhi
,
Yankai Lin
,
Ji-Rong Wen