Kivlichan, Ian

1 publications

NeurIPS 2024 Rule Based Rewards for Language Model Safety Tong Mu, Alec Helyar, Johannes Heidecke, Joshua Achiam, Andrea Vallone, Ian Kivlichan, Molly Lin, Alex Beutel, John Schulman, Lilian Weng