Xiao, Yuxin

3 publications

NeurIPS 2025 KScope: A Framework for Characterizing the Knowledge Status of Language Models Yuxin Xiao, Shan Chen, Jack Gallifant, Danielle Bitterman, Thomas Hartvigsen, Marzyeh Ghassemi
ICML 2025 Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions Yik Siu Chan, Narutatsu Ri, Yuxin Xiao, Marzyeh Ghassemi
NeurIPS 2024 Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control Yuxin Xiao, Chaoqun Wan, Yonggang Zhang, Wenxiao Wang, Binbin Lin, Xiaofei He, Xu Shen, Jieping Ye