ARTICLE: Annotator Reliability Through In-Context Learning

Abstract

Ensuring annotator quality in training and evaluation data is a key piece of machine learning in NLP. Tasks such as sentiment analysis and offensive speech detection are intrinsically subjective, creating a challenging scenario for traditional quality assessment approaches because it is hard to distinguish disagreement due to poor work from that due to differences of opinions between sincere annotators. With the goal of increasing diverse perspectives in annotation while ensuring consistency, we propose ARTICLE, an in-context learning (ICL) framework to estimate annotation quality through self-consistency. We evaluate this framework on two offensive speech datasets using multiple LLMs and compare its performance with traditional methods. Our findings indicate that ARTICLE can be used as a robust method for identifying reliable annotators, hence improving data quality.

Cite

Text

Dutta et al. "ARTICLE: Annotator Reliability Through In-Context Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I13.33558

Markdown

[Dutta et al. "ARTICLE: Annotator Reliability Through In-Context Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/dutta2025aaai-article/) doi:10.1609/AAAI.V39I13.33558

BibTeX

@inproceedings{dutta2025aaai-article,
  title     = {{ARTICLE: Annotator Reliability Through In-Context Learning}},
  author    = {Dutta, Sujan and Pandita, Deepak and Weerasooriya, Tharindu Cyril and Zampieri, Marcos and Homan, Christopher M. and KhudaBukhsh, Ashiqur R.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {14230-14237},
  doi       = {10.1609/AAAI.V39I13.33558},
  url       = {https://mlanthology.org/aaai/2025/dutta2025aaai-article/}
}