Kaunismaa, Jackson

2 publications

ICLR 2026 Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs Jackson Kaunismaa, John Hughes, Christina Q Knight, Avery Griffin, Mrinank Sharma, Erik Jones
ICLRW 2025 A Benchmark for Scalable Oversight Mechanisms Abhimanyu Pallavi Sudhir, Jackson Kaunismaa, Arjun Panickssery