GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

Abstract

Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their capability without fine-tuning is often limited in specialized tasks. However, full fine-tuning of large foundation models is challenging due to enormous computation/memory/dataset requirements. We show that GPT-4o can decode hand gestures from forearm ultrasound data even with no fine-tuning, and improves with few-shot, retrieval augmented in-context learning.

Cite

Text

Bimbraw et al. "GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM." NeurIPS 2024 Workshops: AIM-FM, 2024.

Markdown

[Bimbraw et al. "GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM." NeurIPS 2024 Workshops: AIM-FM, 2024.](https://mlanthology.org/neuripsw/2024/bimbraw2024neuripsw-gpt/)

BibTeX

@inproceedings{bimbraw2024neuripsw-gpt,
  title     = {{GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM}},
  author    = {Bimbraw, Keshav and Wang, Ye and Liu, Jing and Koike-Akino, Toshiaki},
  booktitle = {NeurIPS 2024 Workshops: AIM-FM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/bimbraw2024neuripsw-gpt/}
}