LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
Abstract
Despite significant progress in Video Large Language Models (Video-LLMs) for offline video understanding, existing online Video-LLMs typically struggle to simultaneously process continuous frame-by-frame inputs and determine optimal response timing, often compromising real-time responsiveness and narrative coherence. To address these limitations, we introduce LiveStar, a pioneering live streaming assistant that achieves always-on proactive responses through adaptive streaming decoding. Specifically, LiveStar incorporates: (1) a training strategy enabling incremental video-language alignment for variable-length video streams, preserving temporal consistency across dynamically evolving frame sequences; (2) a response-silence decoding framework that determines optimal proactive response timing via a single forward pass verification; (3) memory-aware acceleration via peak-end memory compression for online inference on 10+ minute videos, combined with streaming key-value cache to achieve 1.53× faster inference. We also construct an OmniStar dataset, a comprehensive dataset for training and benchmarking that encompasses 15 diverse real-world scenarios and 5 evaluation tasks for online video understanding. Extensive experiments across three benchmarks demonstrate LiveStar's state-of-the-art performance, achieving an average 19.5\% improvement in semantic correctness with 18.1\% reduced timing difference compared to existing online Video-LLMs, while improving FPS by 12.0\% across all five OmniStar tasks. Our model and dataset can be accessed at https://github.com/yzy-bupt/LiveStar.
Cite
Text
Yang et al. "LiveStar: Live Streaming Assistant for Real-World Online Video Understanding." Advances in Neural Information Processing Systems, 2025.Markdown
[Yang et al. "LiveStar: Live Streaming Assistant for Real-World Online Video Understanding." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/yang2025neurips-livestar/)BibTeX
@inproceedings{yang2025neurips-livestar,
title = {{LiveStar: Live Streaming Assistant for Real-World Online Video Understanding}},
author = {Yang, Zhenyu and Zhang, Kairui and Hu, Yuhang and Wang, Bing and Qian, Shengsheng and Wen, Bin and Yang, Fan and Gao, Tingting and Dong, Weiming and Xu, Changsheng},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/yang2025neurips-livestar/}
}