ML Anthology
Authors
Search
About
Balagansky, Nikita
4 publications
ICML
2025
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Daniil Laptev
,
Nikita Balagansky
,
Yaroslav Aksenov
,
Daniil Gavrilov
ICLR
2025
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski
,
Boris Shaposhnikov
,
Alexey Malakhov
,
Nikita Surnachev
,
Yaroslav Aksenov
,
Ian Maksimov
,
Nikita Balagansky
,
Daniil Gavrilov
ICLR
2025
Mechanistic Permutability: Match Features Across Layers
Nikita Balagansky
,
Ian Maksimov
,
Daniil Gavrilov
NeurIPS
2022
PALBERT: Teaching ALBERT to Ponder
Nikita Balagansky
,
Daniil Gavrilov