Wu, Ben Peng

3 publications

ICLRW 2025 Antipodal Pairing and Mechanistic Signals in Dense SAE Latents Alessandro Stolfo, Ben Peng Wu, Mrinmaya Sachan
NeurIPS 2025 Dense SAE Latents Are Features, Not Bugs Xiaoqing Sun, Alessandro Stolfo, Joshua Engels, Ben Peng Wu, Senthooran Rajamanoharan, Mrinmaya Sachan, Max Tegmark
ICMLW 2024 Confidence Regulation Neurons in Language Models Alessandro Stolfo, Ben Peng Wu, Wes Gurnee, Yonatan Belinkov, Xingyi Song, Mrinmaya Sachan, Neel Nanda