Perception Score: A Learned Metric for Open-Ended Text Generation Evaluation

Abstract

Automatic evaluation for open-ended natural language generation tasks remains a challenge. We propose a learned evaluation metric: Perception Score. It utilizes a pre-trained model and considers context information for conditional generation. Perception Score assigns a holistic score along with the uncertainty measurement. We conduct experiments on three open-ended conditional generation tasks and two open-ended unconditional generation tasks. Perception Score achieves state-of-the-art results on all the tasks consistently in terms of correlation with human evaluation scores.

Cite

Text

Gu et al. "Perception Score: A Learned Metric for Open-Ended Text Generation Evaluation." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I14.17526

Markdown

[Gu et al. "Perception Score: A Learned Metric for Open-Ended Text Generation Evaluation." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/gu2021aaai-perception/) doi:10.1609/AAAI.V35I14.17526

BibTeX

@inproceedings{gu2021aaai-perception,
  title     = {{Perception Score: A Learned Metric for Open-Ended Text Generation Evaluation}},
  author    = {Gu, Jing and Wu, Qingyang and Yu, Zhou},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {12902-12910},
  doi       = {10.1609/AAAI.V35I14.17526},
  url       = {https://mlanthology.org/aaai/2021/gu2021aaai-perception/}
}