ML Anthology
Authors
Search
About
Nöther, Jonathan
5 publications
AISTATS
2025
Policy Teaching via Data Poisoning in Learning from Human Preferences
Andi Nika
,
Jonathan Nöther
,
Debmalya Mandal
,
Parameswaran Kamalaruban
,
Adish Singla
,
Goran Radanovic
AAAI
2025
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Jonathan Nöther
,
Adish Singla
,
Goran Radanovic
TMLR
2024
Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria
Andi Nika
,
Jonathan Nöther
,
Adish Singla
,
Goran Radanovic
ICMLW
2024
Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria
Andi Nika
,
Jonathan Nöther
,
Adish Singla
,
Goran Radanovic
NeurIPSW
2024
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Jonathan Nöther
,
Adish Singla
,
Goran Radanovic