Zhi, Gong

1 publications

ICLR 2025 Super(ficial)-Alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization Wenkai Yang, Shiqi Shen, Guangyao Shen, Wei Yao, Yong Liu, Gong Zhi, Yankai Lin, Ji-Rong Wen