In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models in Low-Level Workflow Understanding

Abstract

A Standard Operating Procedure (SOP) defines a step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. In this work, we first explore in-context learning with video-language models for SOP generation. We then propose In-Context Ensemble Learning, to aggregate pseudo labels of SOPs. The proposed in-context ensemble learning increases test-time compute and enables the models to learn beyond its context window limit with an implicit consistency regularisation. We report that in-context learning helps video-language models to generate more temporally accurate SOPs, and the proposed in-context ensemble learning can consistently enhance the capabilities of the video-language models in SOP generation.

Cite

Text

Xu et al. "In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models in Low-Level Workflow Understanding." NeurIPS 2024 Workshops: Video-Langauge_Models, 2024.

Markdown

[Xu et al. "In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models in Low-Level Workflow Understanding." NeurIPS 2024 Workshops: Video-Langauge_Models, 2024.](https://mlanthology.org/neuripsw/2024/xu2024neuripsw-incontext/)

BibTeX

@inproceedings{xu2024neuripsw-incontext,
  title     = {{In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models in Low-Level Workflow Understanding}},
  author    = {Xu, Moucheng and Chatzaroulas, Evangelos and McCutcheon, Luc and Ahad, Abdul and Azeem, Hamzah and Marecki, Janusz and Anwar, Ammar},
  booktitle = {NeurIPS 2024 Workshops: Video-Langauge_Models},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/xu2024neuripsw-incontext/}
}