Preacher: Paper-to-Video Agentic System

Abstract

The paper-to-video task converts a research paper into a structured video abstract, distilling key concepts, methods, and conclusions into an accessible, well-organized format. While state-of-the-art video generation models demonstrate potential, they are constrained by limited LLM context windows, rigid video duration constraints, limited stylistic diversity, and an inability to represent domain-specific knowledge. To address these limitations, we introduce Preacher, the first paper-to-video agentic system. Preacher employs a top-down approach to decompose, summarize, and reformulate the paper, followed by bottom-up video generation, synthesizing diverse video segments into a coherent abstract. To align cross-modal representations, we define key scenes and introduce a Progressive Chain of Thought (P-CoT) for granular, iterative planning. Preacher successfully generates high-quality video abstracts across five research fields, demonstrating expertise beyond current video generation models.

Cite

Text

Liu et al. "Preacher: Paper-to-Video Agentic System." International Conference on Computer Vision, 2025.

Markdown

[Liu et al. "Preacher: Paper-to-Video Agentic System." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/liu2025iccv-preacher/)

BibTeX

@inproceedings{liu2025iccv-preacher,
  title     = {{Preacher: Paper-to-Video Agentic System}},
  author    = {Liu, Jingwei and Yang, Ling and Luo, Hao and Wang, Fan and Li, Hongyan and Wang, Mengdi},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {17129-17139},
  url       = {https://mlanthology.org/iccv/2025/liu2025iccv-preacher/}
}