Panaitescu-Liess, Michael-Andrei
12 publications
ICLRW
2025
AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment Attacks
AAAI
2025
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
NeurIPSW
2024
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
ICMLW
2024
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
ICMLW
2024
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
NeurIPSW
2024
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?