Karwowski, Jacek

3 publications

NeurIPSW 2024 Characterizing Stable Regions in the Residual Stream of LLMs Jett Janiak, Jacek Karwowski, Chatrik Singh Mangat, Giorgi Giglemiani, Nora Petrova, Stefan Heimersheim
ICLR 2024 Goodhart's Law in Reinforcement Learning Jacek Karwowski, Oliver Hayman, Xingjian Bai, Klaus Kiendlhofer, Charlie Griffin, Joar Max Viktor Skalse
ICLRW 2024 Limitations of Agents Simulated by Predictive Models Raymond Douglas, Jacek Karwowski, Chan Bae, Andis Draguns, Victoria Krakovna