Switch-a-View: View Selection Learned from Unlabeled In-the-Wild Videos

Abstract

We introduce Switch-a-View, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video. The key insight of our approach is how to train such a model from unlabeled--but human-edited--video samples. We pose a pretext task that pseudo-labels segments in the training videos for their primary viewpoint (egocentric or exocentric), and then discovers the patterns between the visual and spoken content in a how-to video on the one hand and its view-switch moments on the other hand. Armed with this predictor, our model can be applied to new multi-view videos to orchestrate which viewpoint should be displayed when. We demonstrate our idea on a variety of real-world videos from HowTo100M and Ego-Exo4D, and rigorously validate its advantages.

Cite

Text

Majumder et al. "Switch-a-View: View Selection Learned from Unlabeled In-the-Wild Videos." International Conference on Computer Vision, 2025.

Markdown

[Majumder et al. "Switch-a-View: View Selection Learned from Unlabeled In-the-Wild Videos." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/majumder2025iccv-switchaview/)

BibTeX

@inproceedings{majumder2025iccv-switchaview,
  title     = {{Switch-a-View: View Selection Learned from Unlabeled In-the-Wild Videos}},
  author    = {Majumder, Sagnik and Nagarajan, Tushar and Al-Halah, Ziad and Grauman, Kristen},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {11969-11979},
  url       = {https://mlanthology.org/iccv/2025/majumder2025iccv-switchaview/}
}