AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions

Abstract

AidanBench evaluates large language models (LLMs) on their ability to generate novel ideas in response to open-ended questions, focusing on creativity, reliability, contextual attention, and instruction following. Unlike benchmarks with clear-cut answers, AidanBench assesses models in more open-ended, real-world tasks. Testing several state-of-the-art LLMs, it shows weak correlation with existing benchmarks while offering a more nuanced view of their performance in open-ended scenarios.

Cite

Text

McLaughlin et al. "AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions." NeurIPS 2024 Workshops: LanGame, 2024.

Markdown

[McLaughlin et al. "AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions." NeurIPS 2024 Workshops: LanGame, 2024.](https://mlanthology.org/neuripsw/2024/mclaughlin2024neuripsw-aidanbench/)

BibTeX

@inproceedings{mclaughlin2024neuripsw-aidanbench,
  title     = {{AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions}},
  author    = {McLaughlin, Aidan and Uppuluri, Anuja and Campbell, James},
  booktitle = {NeurIPS 2024 Workshops: LanGame},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/mclaughlin2024neuripsw-aidanbench/}
}