Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications

Abstract

This study focuses on optimizing neural network architectures for real-time detection of driver facial bounding boxes. Initially, we trained the Florence-2 model, which demonstrated high accuracy but proved too large for real-time applications. To address this, we employed model distillation, using Florence-2 as a teacher to train a more compact DINOv2 model. Our aim was to maintain high detection accuracy while minimizing memory usage and inference time, making the solution viable for real-time implementation on GPU and NPU devices. We present a comparative analysis of model performance in terms of IoU scores, memory consumption and inference times.

Cite

Text

Chatziloizos et al. "Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications." NeurIPS 2024 Workshops: Compression, 2024.

Markdown

[Chatziloizos et al. "Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications." NeurIPS 2024 Workshops: Compression, 2024.](https://mlanthology.org/neuripsw/2024/chatziloizos2024neuripsw-integration/)

BibTeX

@inproceedings{chatziloizos2024neuripsw-integration,
  title     = {{Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications}},
  author    = {Chatziloizos, Georgios Markos and Ancora, Andrea and Comport, Andrew I. and Christian, Barat},
  booktitle = {NeurIPS 2024 Workshops: Compression},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/chatziloizos2024neuripsw-integration/}
}