Generalizing Reward Modeling for Out-of-Distribution Preference Learning

Cite

Text

Jia. "Generalizing Reward Modeling for Out-of-Distribution Preference Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70362-1_7

Markdown

[Jia. "Generalizing Reward Modeling for Out-of-Distribution Preference Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/jia2024ecmlpkdd-generalizing/) doi:10.1007/978-3-031-70362-1_7

BibTeX

@inproceedings{jia2024ecmlpkdd-generalizing,
  title     = {{Generalizing Reward Modeling for Out-of-Distribution Preference Learning}},
  author    = {Jia, Chen},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {107-124},
  doi       = {10.1007/978-3-031-70362-1_7},
  url       = {https://mlanthology.org/ecmlpkdd/2024/jia2024ecmlpkdd-generalizing/}
}