Nöther, Jonathan

5 publications

AISTATS 2025 Policy Teaching via Data Poisoning in Learning from Human Preferences Andi Nika, Jonathan Nöther, Debmalya Mandal, Parameswaran Kamalaruban, Adish Singla, Goran Radanovic
AAAI 2025 Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints Jonathan Nöther, Adish Singla, Goran Radanovic
TMLR 2024 Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria Andi Nika, Jonathan Nöther, Adish Singla, Goran Radanovic
ICMLW 2024 Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria Andi Nika, Jonathan Nöther, Adish Singla, Goran Radanovic
NeurIPSW 2024 Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints Jonathan Nöther, Adish Singla, Goran Radanovic