Dataset MKSL for Measuring Adequate Response Performance by Knowledge Level
Abstract
Currently, generative AI is performing well in various fields. In particu- lar, GPT-4, one of the basic models, has been evaluated for its discourse quality, knowledge level, and problem-solving ability on various benchmark datasets. However, it is questionable whether the base model can appro- priately adjust its output level according to the user’s knowledge level. If the base model fails to consider the user’s knowledge level, the quality and reliability of the discourse is bound to decrease. However, common datasets are still insufficient to measure whether the base model responds appropriately to the user’s knowledge level. Therefore, based on Korean educational experts and curricula, we developed a benchmark dataset to evaluate whether the underlying model can elicit appropriate discourse ac- cording to the user’s knowledge level. This mini-dataset consists of about 500 Korean datasets centered on science and current events in the field of science, and we introduce the evaluation method using it. The dataset will also be released soon after it is expanded.
Cite
Text
NohMyongSung and Hui. "Dataset MKSL for Measuring Adequate Response Performance by Knowledge Level." ICLR 2024 Workshops: R2-FM, 2024.Markdown
[NohMyongSung and Hui. "Dataset MKSL for Measuring Adequate Response Performance by Knowledge Level." ICLR 2024 Workshops: R2-FM, 2024.](https://mlanthology.org/iclrw/2024/nohmyongsung2024iclrw-dataset/)BibTeX
@inproceedings{nohmyongsung2024iclrw-dataset,
title = {{Dataset MKSL for Measuring Adequate Response Performance by Knowledge Level}},
author = {NohMyongSung, and Hui, Cho Ung},
booktitle = {ICLR 2024 Workshops: R2-FM},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/nohmyongsung2024iclrw-dataset/}
}