ML Anthology
Authors
Search
About
Panfilov, Alexander
9 publications
ICLR
2026
ASIDE: Architectural Separation of Instructions and Data in Language Models
Egor Zverev
,
Evgenii Kortukov
,
Alexander Panfilov
,
Alexandra Volkova
,
Rush Tabesh
,
Sebastian Lapuschkin
,
Wojciech Samek
,
Christoph H. Lampert
ICLR
2026
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Mikhail Terekhov
,
Alexander Panfilov
,
Daniil Dzenhaliou
,
Caglar Gulcehre
,
Maksym Andriushchenko
,
Ameya Prabhu
,
Jonas Geiping
ICLR
2026
Capability-Based Scaling Trends for LLM-Based Red-Teaming
Alexander Panfilov
,
Paul Kassianik
,
Maksym Andriushchenko
,
Jonas Geiping
ICLR
2026
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
Alexander Panfilov
,
Evgenii Kortukov
,
Kristina Nikolić
,
Matthias Bethge
,
Sebastian Lapuschkin
,
Wojciech Samek
,
Ameya Prabhu
,
Maksym Andriushchenko
,
Jonas Geiping
ICLRW
2025
ASIDE: Architectural Separation of Instructions and Data in Language Models
Egor Zverev
,
Evgenii Kortukov
,
Alexander Panfilov
,
Soroush Tabesh
,
Sebastian Lapuschkin
,
Wojciech Samek
,
Christoph H. Lampert
ICML
2025
An Interpretable N-Gram Perplexity Threat Model for Large Language Model Jailbreaks
Valentyn Boreiko
,
Alexander Panfilov
,
Vaclav Voracek
,
Matthias Hein
,
Jonas Geiping
NeurIPSW
2024
A Realistic Threat Model for Large Language Model Jailbreaks
Valentyn Boreiko
,
Alexander Panfilov
,
Vaclav Voracek
,
Matthias Hein
,
Jonas Geiping
ICLR
2024
Provable Compositional Generalization for Object-Centric Learning
Thaddäus Wiedemer
,
Jack Brady
,
Alexander Panfilov
,
Attila Juhos
,
Matthias Bethge
,
Wieland Brendel
CoLLAs
2023
A Minimalist Approach for Domain Adaptation with Optimal Transport
Arip Asadulaev
,
Vitaly Shutov
,
Alexander Korotin
,
Alexander Panfilov
,
Vladislava Kontsevaya
,
Andrey Filchenkov