HiVid: LLM-Guided Video Saliency for Content-Aware VOD and Live Streaming

Chen, Jiahui; Peng, Bo; Jia, Lianchen; Zhang, Zeyu; Huang, Tianchi; Sun, Lifeng

HiVid: LLM-Guided Video Saliency for Content-Aware VOD and Live Streaming

Jiahui Chen, Bo Peng, Lianchen Jia, Zeyu Zhang, Tianchi Huang, Lifeng Sun

ICLR 2026

/iclr/2026/chen2026iclr-hivid/

Abstract

Content-aware streaming requires dynamic, chunk-level importance weights to optimize subjective quality of experience (QoE). However, direct human annotation is prohibitively expensive while vision-saliency models generalize poorly. We introduce HiVid, the first framework to leverage Large Language Models (LLMs) as a scalable human proxy to generate high-fidelity weights for both Video-on-Demand (VOD) and live streaming. We address 3 non-trivial challenges: (1) To extend LLMs' limited modality and circumvent token limits, we propose a perception module to assess frames in a local context window, autoregressively building a coherent understanding of the video. (2) For VOD with rating inconsistency across local windows, we propose a ranking module to perform global re-ranking with a novel LLM-guided merge-sort algorithm. (3) For live streaming which requires low-latency, online inference without future knowledge, we propose a prediction module to predict future weights with a multi-modal time series model, which comprises a content-aware attention and adaptive horizon to accommodate asynchronous LLM inference. Extensive experiments show HiVid improves weight prediction accuracy by up to 11.5\% for VOD and 26\% for live streaming over SOTA baselines. Real-world user study validates HiVid boosts streaming QoE correlation by 14.7\%.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Chen et al. "HiVid: LLM-Guided Video Saliency for Content-Aware VOD and Live Streaming." International Conference on Learning Representations, 2026.

Markdown

[Chen et al. "HiVid: LLM-Guided Video Saliency for Content-Aware VOD and Live Streaming." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chen2026iclr-hivid/)

BibTeX

@inproceedings{chen2026iclr-hivid,
  title     = {{HiVid: LLM-Guided Video Saliency for Content-Aware VOD and Live Streaming}},
  author    = {Chen, Jiahui and Peng, Bo and Jia, Lianchen and Zhang, Zeyu and Huang, Tianchi and Sun, Lifeng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/chen2026iclr-hivid/}
}