Yan, Liu

2 publications

ICLR 2026 Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions Yuntai Bao, Xuhong Zhang, Jintao Chen, Ge Su, Yuxiang Cai, Hao Peng, Sun Bing, Haiqin Weng, Liu Yan, Jianwei Yin
ICLR 2025 A Benchmark for Semantic Sensitive Information in LLMs Outputs Qingjie Zhang, Han Qiu, Di Wang, Yiming Li, Tianwei Zhang, Wenyu Zhu, Haiqin Weng, Liu Yan, Chao Zhang