Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding

Abstract

Decoding visual stimuli from brain recordings aims to deepen our understanding of the human visual system and build a solid foundation for bridging human and computer vision through the Brain-Computer Interface. However, reconstructing high-quality images with correct semantics from brain recordings is a challenging problem due to the complex underlying representations of brain signals and the scarcity of data annotations. In this work, we present MinD-Vis: Sparse Masked Brain Modeling with Double-Conditioned Latent Diffusion Model for Human Vision Decoding. Firstly, we learn an effective self-supervised representation of fMRI data using mask modeling in a large latent space inspired by the sparse coding of information in the primary visual cortex. Then by augmenting a latent diffusion model with double-conditioning, we show that MinD-Vis can reconstruct highly plausible images with semantically matching details from brain recordings using very few paired annotations. We benchmarked our model qualitatively and quantitatively; the experimental results indicate that our method outperformed state-of-the-art in both semantic mapping (100-way semantic classification) and generation quality (FID) by 66% and 41% respectively. An exhaustive ablation study was also conducted to analyze our framework.

Cite

Text

Chen et al. "Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.02175

Markdown

[Chen et al. "Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/chen2023cvpr-seeing/) doi:10.1109/CVPR52729.2023.02175

BibTeX

@inproceedings{chen2023cvpr-seeing,
  title     = {{Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding}},
  author    = {Chen, Zijiao and Qing, Jiaxin and Xiang, Tiange and Yue, Wan Lin and Zhou, Juan Helen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {22710-22720},
  doi       = {10.1109/CVPR52729.2023.02175},
  url       = {https://mlanthology.org/cvpr/2023/chen2023cvpr-seeing/}
}