GenVOG-DiT: A Transformer-Based Diffusion Model for Pose-Driven, Patient-Agnostic Nystagmus VOG Video Generation

Rahman, Aimon; Green, Kemar E.; Patel, Vishal M.

GenVOG-DiT: A Transformer-Based Diffusion Model for Pose-Driven, Patient-Agnostic Nystagmus VOG Video Generation

Aimon Rahman, Kemar E. Green, Vishal M. Patel

MIDL 2026 pp. 2265-2282

/midl/2026/rahman2026midl-genvogdit/

Abstract

Nystagmus, an involuntary eye movement indicative of neurological and vestibular disorders, is traditionally diagnosed using costly equipment or expert visual inspection: both of which limit accessibility in nonspecialist settings. Recent advances in computer vision and deep learning present an opportunity to automate the detection of nystagmus from standard video recordings. However, progress is hindered by the scarcity of publicly available video datasets due to privacy concerns surrounding ocular biometric data. In this work, we propose the use of synthetically generated eye movement videos to mitigate data limitations. Using video diffusion models, we simulate diverse clinically plausible nystagmus patterns without relying on real patient data, enabling scalable training while preserving privacy. We show that models trained on synthetic data generalize effectively to real-world settings and show potential for integration into telehealth applications. Our approach advances the development of accessible, generalizable, and privacy-aware diagnostic tools for eye movement disorders.

PDF MIDL Semantic Scholar

Cite

Text

Rahman et al. "GenVOG-DiT: A Transformer-Based Diffusion Model for Pose-Driven, Patient-Agnostic Nystagmus VOG Video Generation." Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, 2026.

Markdown

[Rahman et al. "GenVOG-DiT: A Transformer-Based Diffusion Model for Pose-Driven, Patient-Agnostic Nystagmus VOG Video Generation." Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, 2026.](https://mlanthology.org/midl/2026/rahman2026midl-genvogdit/)

BibTeX

@inproceedings{rahman2026midl-genvogdit,
  title     = {{GenVOG-DiT: A Transformer-Based Diffusion Model for Pose-Driven, Patient-Agnostic Nystagmus VOG Video Generation}},
  author    = {Rahman, Aimon and Green, Kemar E. and Patel, Vishal M.},
  booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  year      = {2026},
  pages     = {2265-2282},
  volume    = {315},
  url       = {https://mlanthology.org/midl/2026/rahman2026midl-genvogdit/}
}