AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions
Abstract
AidanBench evaluates large language models (LLMs) on their ability to generate novel ideas in response to open-ended questions, focusing on creativity, reliability, contextual attention, and instruction following. Unlike benchmarks with clear-cut answers, AidanBench assesses models in more open-ended, real-world tasks. Testing several state-of-the-art LLMs, it shows weak correlation with existing benchmarks while offering a more nuanced view of their performance in open-ended scenarios.
Cite
Text
McLaughlin et al. "AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions." NeurIPS 2024 Workshops: LanGame, 2024.Markdown
[McLaughlin et al. "AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions." NeurIPS 2024 Workshops: LanGame, 2024.](https://mlanthology.org/neuripsw/2024/mclaughlin2024neuripsw-aidanbench/)BibTeX
@inproceedings{mclaughlin2024neuripsw-aidanbench,
title = {{AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions}},
author = {McLaughlin, Aidan and Uppuluri, Anuja and Campbell, James},
booktitle = {NeurIPS 2024 Workshops: LanGame},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/mclaughlin2024neuripsw-aidanbench/}
}