ML Anthology
Authors
Search
About
Terekhov, Mikhail
6 publications
ICLR
2026
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Mikhail Terekhov
,
Alexander Panfilov
,
Daniil Dzenhaliou
,
Caglar Gulcehre
,
Maksym Andriushchenko
,
Ameya Prabhu
,
Jonas Geiping
ICLR
2026
Control Tax: The Price of Keeping AI in Check
Mikhail Terekhov
,
Zhen Ning David Liu
,
Caglar Gulcehre
,
Samuel Albanie
NeurIPS
2025
One-Step Is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
Viacheslav Surkov
,
Chris Wendler
,
Antonio Mari
,
Mikhail Terekhov
,
Justin Deschenaux
,
Robert West
,
Caglar Gulcehre
,
David Bau
ICMLW
2024
In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning
Mikhail Terekhov
,
Caglar Gulcehre
NeurIPSW
2023
Second-Order Jailbreaks: Generative Agents Successfully Manipulate Through an Intermediary
Mikhail Terekhov
,
Romain Graux
,
Eduardo Neville
,
Denis Rosset
,
Gabin Kolly
ICCV
2023
Tangent Sampson Error: Fast Approximate Two-View Reprojection Error for Central Camera Models
Mikhail Terekhov
,
Viktor Larsson