QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
Abstract
Long video understanding has emerged as a crucial capability in real-world applications such as meeting summarization, video surveillance, educational lecture analysis, and content moderation. However, it remains computationally prohibitive for VideoLLMs, primarily due to two bottlenecks: 1) sequential video decoding, the process of converting the raw bit stream to RGB frames can take up to a minute for hour-long video inputs, and 2) costly prefilling of up to several million tokens for LLM inference, resulting in high latency and memory use. To address these challenges, we propose QuickVideo, a system-algorithm co-design that substantially accelerates long video understanding to support real-time downstream applications. It comprises three key innovations: QuickCodec, a parallelized CPU-based video decoder that achieves 2–3× speedup by splitting videos into keyframe-aligned intervals processed concurrently. QuickPrefill, a memory-efficient prefilling method using KV-cache pruning to support more frames with less GPU memory; and an overlapping scheme that overlaps CPU video decoding with GPU inference. Together, these components reduce the time required to process a long video input by a minute, enabling fast, efficient video understanding even on limited hardware. Experiments show that QuickVideo generalizes across durations and sampling rates, making long video processing feasible in practice.
Cite
Text
Schneider et al. "QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design." Transactions on Machine Learning Research, 2026.Markdown
[Schneider et al. "QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/schneider2026tmlr-quickvideo/)BibTeX
@article{schneider2026tmlr-quickvideo,
title = {{QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design}},
author = {Schneider, Benjamin and Jiang, Dongfu and Du, Chao and Pang, Tianyu and Chen, Wenhu},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/schneider2026tmlr-quickvideo/}
}