MedMCQA: A Large-Scale Multi-Subject Multi-Choice Dataset for Medical Domain Question Answering

Abstract

This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects & topics. A detailed explanation of the solution, along with the above information, is provided in this study.

Cite

Text

Pal et al. "MedMCQA: A Large-Scale Multi-Subject Multi-Choice Dataset for Medical Domain Question Answering." Proceedings of the Conference on Health, Inference, and Learning, 2022.

Markdown

[Pal et al. "MedMCQA: A Large-Scale Multi-Subject Multi-Choice Dataset for Medical Domain Question Answering." Proceedings of the Conference on Health, Inference, and Learning, 2022.](https://mlanthology.org/chil/2022/pal2022chil-medmcqa/)

BibTeX

@inproceedings{pal2022chil-medmcqa,
  title     = {{MedMCQA: A Large-Scale Multi-Subject Multi-Choice Dataset for Medical Domain Question Answering}},
  author    = {Pal, Ankit and Umapathi, Logesh Kumar and Sankarasubbu, Malaikannan},
  booktitle = {Proceedings of the Conference on Health, Inference, and Learning},
  year      = {2022},
  pages     = {248-260},
  volume    = {174},
  url       = {https://mlanthology.org/chil/2022/pal2022chil-medmcqa/}
}