In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models in Low-Level Workflow Understanding
Abstract
A Standard Operating Procedure (SOP) defines a step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. In this work, we first explore in-context learning with video-language models for SOP generation. We then propose In-Context Ensemble Learning, to aggregate pseudo labels of SOPs. The proposed in-context ensemble learning increases test-time compute and enables the models to learn beyond its context window limit with an implicit consistency regularisation. We report that in-context learning helps video-language models to generate more temporally accurate SOPs, and the proposed in-context ensemble learning can consistently enhance the capabilities of the video-language models in SOP generation.
Cite
Text
Xu et al. "In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models in Low-Level Workflow Understanding." NeurIPS 2024 Workshops: Video-Langauge_Models, 2024.Markdown
[Xu et al. "In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models in Low-Level Workflow Understanding." NeurIPS 2024 Workshops: Video-Langauge_Models, 2024.](https://mlanthology.org/neuripsw/2024/xu2024neuripsw-incontext/)BibTeX
@inproceedings{xu2024neuripsw-incontext,
title = {{In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models in Low-Level Workflow Understanding}},
author = {Xu, Moucheng and Chatzaroulas, Evangelos and McCutcheon, Luc and Ahad, Abdul and Azeem, Hamzah and Marecki, Janusz and Anwar, Ammar},
booktitle = {NeurIPS 2024 Workshops: Video-Langauge_Models},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/xu2024neuripsw-incontext/}
}