ML Anthology
Authors
Search
About
Niu, Luyao
11 publications
AAAI
2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang
,
Zhangchen Xu
,
Luyao Niu
,
Bill Yuchen Lin
,
Radha Poovendran
ICLRW
2025
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
Yuetai Li
,
Zhangchen Xu
,
Fengqing Jiang
,
Luyao Niu
,
Dinuka Sahabandu
,
Bhaskar Ramasubramanian
,
Radha Poovendran
ICLR
2025
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu
,
Fengqing Jiang
,
Luyao Niu
,
Yuntian Deng
,
Radha Poovendran
,
Yejin Choi
,
Bill Yuchen Lin
ICLRW
2025
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
Fengqing Jiang
,
Zhangchen Xu
,
Yuetai Li
,
Luyao Niu
,
Zhen Xiang
,
Bo Li
,
Bill Yuchen Lin
,
Radha Poovendran
ICLRW
2025
Stronger Models Are NOT Always Stronger Teachers for Instruction Tuning
Zhangchen Xu
,
Fengqing Jiang
,
Luyao Niu
,
Bill Yuchen Lin
,
Radha Poovendran
ICLRW
2024
ArtPrompt: ASCII Art-Based Jailbreak Attacks Against Aligned LLMs
Fengqing Jiang
,
Zhangchen Xu
,
Luyao Niu
,
Zhen Xiang
,
Bhaskar Ramasubramanian
,
Bo Li
,
Radha Poovendran
NeurIPSW
2024
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang
,
Zhangchen Xu
,
Luyao Niu
,
Bill Yuchen Lin
,
Radha Poovendran
ICLRW
2024
SafeDecoding: Defending Against Jailbreak Attacks via Safety-Aware Decoding
Zhangchen Xu
,
Fengqing Jiang
,
Luyao Niu
,
Jinyuan Jia
,
Bill Yuchen Lin
,
Radha Poovendran
NeurIPS
2023
FedGame: A Game-Theoretic Defense Against Backdoor Attacks in Federated Learning
Jinyuan Jia
,
Zhuowen Yuan
,
Dinuka Sahabandu
,
Luyao Niu
,
Arezoo Rajabi
,
Bhaskar Ramasubramanian
,
Bo Li
,
Radha Poovendran
NeurIPSW
2023
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Fengqing Jiang
,
Zhangchen Xu
,
Luyao Niu
,
Boxin Wang
,
Jinyuan Jia
,
Bo Li
,
Radha Poovendran
IJCAI
2023
Learning Dissemination Strategies for External Sources in Opinion Dynamic Models with Cognitive Biases
Abdullah Al Maruf
,
Luyao Niu
,
Bhaskar Ramasubramanian
,
Andrew Clark
,
Radha Poovendran