Moalla, Skander
5 publications
NeurIPS
2025
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
NeurIPS
2024
Building on Efficient Foundations: Effective Training of LLMs with Structured Feedforward Layers
NeurIPS
2024
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO