Soo, Samuel

1 publications

ICLRW 2025 Interpretable Steering of Large Language Models with Feature Guided Activation Additions Samuel Soo, Wesley Teng, Chandrasekaran Balaganesh, Tan Guoxian, Ming Yan