ML Anthology
Authors
Search
About
Vallone, Andrea
2 publications
ICMLW
2024
Rule Based Rewards for Fine-Grained LLM Safety
Tong Mu
,
Alec Helyar
,
Johannes Heidecke
,
Joshua Achiam
,
Andrea Vallone
,
Ian D Kivlichan
,
Molly Lin
,
Alex Beutel
,
John Schulman
,
Lilian Weng
NeurIPS
2024
Rule Based Rewards for Language Model Safety
Tong Mu
,
Alec Helyar
,
Johannes Heidecke
,
Joshua Achiam
,
Andrea Vallone
,
Ian Kivlichan
,
Molly Lin
,
Alex Beutel
,
John Schulman
,
Lilian Weng