Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals

Abstract

The field of brain-to-stimuli reconstruction has seen significant progress in the last few years, but techniques continue to be subject-specific and are usually tested on a single dataset. In this work, we present a novel technique to reconstruct videos from functional Magnetic Resonance Imaging (fMRI) signals designed for performance across datasets and across human participants. Our pipeline accurately generates 2 and 3-second video clips from brain activity coming from distinct participants and different datasets by leveraging multi-dataset and multi-subject training. This helps us regress key latent and conditioning vectors for pretrained text-to-video and video-to-video models to reconstruct accurate videos that match the original stimuli observed by the participant. Key to our pipeline is the introduction of a 3-stage approach that first aligns fMRI signals to semantic embeddings, then regresses important vectors, and finally generates videos with those estimations. Our method demonstrates state-of-the-art reconstruction capabilities verified by qualitative and quantitative analyses, including crowd-sourced human evaluation. We showcase performance improvements across two datasets, as well as in multi-subject setups. Our ablation studies shed light on how different alignment strategies and data scaling decisions impact reconstruction performance, and we hint at a future for zero-shot reconstruction by analyzing how performance evolves as more subject data is leveraged.

Cite

Text

Fosco et al. "Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73347-5_26

Markdown

[Fosco et al. "Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/fosco2024eccv-brain/) doi:10.1007/978-3-031-73347-5_26

BibTeX

@inproceedings{fosco2024eccv-brain,
  title     = {{Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals}},
  author    = {Fosco, Camilo L and Lahner, Benjamin and Pan, Bowen and Andonian, Alex and Josephs, Emilie L and Lascelles, Alex and Oliva, Aude},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73347-5_26},
  url       = {https://mlanthology.org/eccv/2024/fosco2024eccv-brain/}
}