The Role of Video Generation in Enhancing Data-Limited Action Understanding

Li, Wei; Luo, Dezhao; Yang, Dongbao; Li, Zhenhang; Wang, Weiping; Zhou, Yu

doi:10.24963/IJCAI.2025/160

The Role of Video Generation in Enhancing Data-Limited Action Understanding

Wei Li, Dezhao Luo, Dongbao Yang, Zhenhang Li, Weiping Wang, Yu Zhou

IJCAI 2025 pp. 1431-1439

doi:10.24963/IJCAI.2025/160 /ijcai/2025/li2025ijcai-role/

Abstract

Video action understanding tasks in real-world scenarios often suffer from data limitations. In this paper, we address the data-limited action understanding problem by bridging data scarcity. We propose a novel method that leverages a text-to-video diffusion transformer to generate annotated data for model training. This paradigm enables the generation of realistic annotated data on an infinite scale without human intervention. We proposed the Information Enhancement Strategy and the Uncertainty-Based Soft Target tailored to generate sample training. Through quantitative and qualitative analyzes, we discovered that real samples generally contain a richer level of information compared to generated samples. Based on this observation, the information enhancement strategy was designed to enhance the informational content of the generated samples from two perspectives: the environment and the character. Furthermore, we observed that a portion of low-quality generated samples might negatively affect model training. To address this, we devised an uncertainty-based label-smoothing strategy to increase the smoothing of these low-quality samples, thereby reducing their impact. We demonstrate the effectiveness of the proposed method on four datasets and five tasks, and achieve state-of-the-art performance for zero-shot action recognition.

PDF IJCAI Semantic Scholar

Cite

Text

Li et al. "The Role of Video Generation in Enhancing Data-Limited Action Understanding." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/160

Markdown

[Li et al. "The Role of Video Generation in Enhancing Data-Limited Action Understanding." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/li2025ijcai-role/) doi:10.24963/IJCAI.2025/160

BibTeX

@inproceedings{li2025ijcai-role,
  title     = {{The Role of Video Generation in Enhancing Data-Limited Action Understanding}},
  author    = {Li, Wei and Luo, Dezhao and Yang, Dongbao and Li, Zhenhang and Wang, Weiping and Zhou, Yu},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {1431-1439},
  doi       = {10.24963/IJCAI.2025/160},
  url       = {https://mlanthology.org/ijcai/2025/li2025ijcai-role/}
}