ExpertAF: Expert Actionable Feedback from Video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos, Kris Kitani, Kristen Grauman

CVPR 2025 pp. 13582-13594

doi:10.1109/CVPR52734.2025.01268 /cvpr/2025/ashutosh2025cvpr-expertaf/

Abstract

Feedback is essential for learning a new skill or improving one's current skill-level. However, current methods for skill-assessment from video only provide scores or compare demonstrations, leaving the burden of knowing what to do differently on the user. We introduce a novel method to generate actionable feedback (AF) from video of a person doing a physical activity, such as basketball or soccer. Our method takes a video demonstration and its accompanying 3D body pose and generates (1) free-form expert commentary describing what the person is doing well and what they could improve, and (2) a visual expert demonstration that incorporates the required corrections. We show how to leverage Ego-Exo4D's videos of skilled activity and expert commentary together with a strong language model to create a weakly-supervised training dataset for this task, and we devise a multimodal video-language model to infer coaching feedback. Our method is able to reason across multi-modal input combinations to output full-spectrum, actionable coaching--expert commentary, expert video retrieval, and expert pose generation--outperforming strong vision-language models on both established metrics and human preference studies.

PDF CVPR Semantic Scholar

Cite

Text

Ashutosh et al. "ExpertAF: Expert Actionable Feedback from Video." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01268

Markdown

[Ashutosh et al. "ExpertAF: Expert Actionable Feedback from Video." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/ashutosh2025cvpr-expertaf/) doi:10.1109/CVPR52734.2025.01268

BibTeX

@inproceedings{ashutosh2025cvpr-expertaf,
  title     = {{ExpertAF: Expert Actionable Feedback from Video}},
  author    = {Ashutosh, Kumar and Nagarajan, Tushar and Pavlakos, Georgios and Kitani, Kris and Grauman, Kristen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {13582-13594},
  doi       = {10.1109/CVPR52734.2025.01268},
  url       = {https://mlanthology.org/cvpr/2025/ashutosh2025cvpr-expertaf/}
}