ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation

Lin, Jingzhong; Li, Xinru; Qi, Yuanyuan; Zhang, Bohao; Liu, Wenxiang; Tang, Kecheng; Huang, Wenxuan; Xu, Xiangfeng; Li, Bangyan; Wang, Changbo; He, Gaoqi

ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation

Jingzhong Lin, Xinru Li, Yuanyuan Qi, Bohao Zhang, Wenxiang Liu, Kecheng Tang, Wenxuan Huang, Xiangfeng Xu, Bangyan Li, Changbo Wang, Gaoqi He

ICLR 2026

/iclr/2026/lin2026iclr-reactdance/

Abstract

Reactive dance generation (RDG), the task of generating a dance conditioned on a lead dancer's motion, holds significant promise for enhancing human-robot interaction and immersive digital entertainment. Despite progress in duet synchronization and motion-music alignment, two key challenges remain: generating fine-grained spatial interactions and ensuring long-term temporal coherence. In this work, we introduce $\textbf{ReactDance}$, a diffusion framework that operates on a novel hierarchical latent space to address these spatiotemporal challenges in RDG. First, for fine-grained spatial control and artistic expression, we propose Hierarchical Finite Scalar Quantization ($\textbf{HFSQ}$). This multi-scale motion representation effectively disentangles coarse body posture from high-frequency dynamics, enabling independent and detailed control over both aspects through a layered guidance mechanism. Second, to efficiently generate long sequences with high temporal coherence, we propose Blockwise Local Context ($\textbf{BLC}$), a non-autoregressive sampling strategy. Departing from slow, frame-by-frame generation, BLC partitions the sequence into blocks and synthesizes them in parallel via periodic causal masking and positional encodings. Coherence across these blocks is ensured by a dense sliding-window training approach that enriches the representation with local temporal context. Extensive experiments show that ReactDance substantially outperforms state-of-the-art methods in motion quality, long-term coherence, and sampling efficiency. Project page: https://ripemangobox.github.io/ReactDance.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Lin et al. "ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation." International Conference on Learning Representations, 2026.

Markdown

[Lin et al. "ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/lin2026iclr-reactdance/)

BibTeX

@inproceedings{lin2026iclr-reactdance,
  title     = {{ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation}},
  author    = {Lin, Jingzhong and Li, Xinru and Qi, Yuanyuan and Zhang, Bohao and Liu, Wenxiang and Tang, Kecheng and Huang, Wenxuan and Xu, Xiangfeng and Li, Bangyan and Wang, Changbo and He, Gaoqi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/lin2026iclr-reactdance/}
}