ML Anthology
Authors
Search
About
Gallego, Víctor
3 publications
ICLRW
2025
MetaSC: Test-Time Safety Specification Optimization for Language Models
Victor Gallego
ICMLW
2024
Merging Improves Self-Critique Against Jailbreak Attacks
Victor Gallego
AAAI
2019
Reinforcement Learning Under Threats
Víctor Gallego
,
Roi Naveiro
,
David Ríos Insua