ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

Shen, Yufan; Luo, Chuwei; Zhu, Zhaoqing; Chen, Yang; Zheng, Qi; Yu, Zhi; Bu, Jiajun; Yao, Cong

doi:10.1609/AAAI.V39I7.32735

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

Yufan Shen, Chuwei Luo, Zhaoqing Zhu, Yang Chen, Qi Zheng, Zhi Yu, Jiajun Bu, Cong Yao

AAAI 2025 pp. 6851-6859

doi:10.1609/AAAI.V39I7.32735 /aaai/2025/shen2025aaai-proctag/

Abstract

Recently, large language models (LLMs) and multimodal large language models (MLLMs) have demonstrated promising results on document visual question answering (VQA) task, particularly after training on document instruction datasets. An effective evaluation method for document instruction data is crucial in constructing instruction data with high efficacy, which, in turn, facilitates the training of LLMs and MLLMs for document VQA. However, most existing evaluation methods for instruction data are limited to the textual content of the instructions themselves, thereby hindering the effective assessment of document instruction datasets and constraining their construction. In this paper, we propose ProcTag, a data-oriented method that assesses the efficacy of document instruction data. ProcTag innovatively performs tagging on the execution process of instructions rather than the instruction text itself. By leveraging the diversity and complexity of these tags to assess the efficacy of the given dataset, ProcTag enables selective sampling or filtering of document instructions. Furthermore, DocLayPrompt, a novel semi-structured layout-aware document prompting strategy, is proposed for effectively representing documents. Experiments demonstrate that sampling existing open-sourced and generated document VQA/instruction datasets with ProcTag significantly outperforms current methods for evaluating instruction data. Impressively, with ProcTag-based sampling in the generated document datasets, only 30.5 percent of the document instructions are required to achieve 100 percent efficacy compared to the complete dataset.

PDF AAAI Semantic Scholar

Cite

Text

Shen et al. "ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I7.32735

Markdown

[Shen et al. "ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/shen2025aaai-proctag/) doi:10.1609/AAAI.V39I7.32735

BibTeX

@inproceedings{shen2025aaai-proctag,
  title     = {{ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data}},
  author    = {Shen, Yufan and Luo, Chuwei and Zhu, Zhaoqing and Chen, Yang and Zheng, Qi and Yu, Zhi and Bu, Jiajun and Yao, Cong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {6851-6859},
  doi       = {10.1609/AAAI.V39I7.32735},
  url       = {https://mlanthology.org/aaai/2025/shen2025aaai-proctag/}
}