Envision Human-AI Perceptual Alignment from a Multimodal Interaction Perspective
Abstract
Aligning AI with human intent has seen progress, yet perceptual alignment—how AI interprets what we see, hear, feel, or smell—remains underexplored. This paper advocates for expanding perceptual alignment efforts across multimodal sensory modalities, such as touch and olfaction, which are critical for how humans perceive and interpret their environment. We envision AI systems enabling natural, multimodal interactions in everyday contexts, such as selecting clothing that aligns with temperature and texture preferences or recreating rich sensory ambiances that evoke specific sights, sounds, and smells. By advancing multimodal representation learning and perceptual alignment, this work aims to inspire the computer science and human-computer interaction (HCI) communities to design inclusive, human-centered AI systems for everyday, multisensory experiences.
Cite
Text
Zhong and Obrist. "Envision Human-AI Perceptual Alignment from a Multimodal Interaction Perspective." ICLR 2025 Workshops: Bi-Align, 2025.Markdown
[Zhong and Obrist. "Envision Human-AI Perceptual Alignment from a Multimodal Interaction Perspective." ICLR 2025 Workshops: Bi-Align, 2025.](https://mlanthology.org/iclrw/2025/zhong2025iclrw-envision/)BibTeX
@inproceedings{zhong2025iclrw-envision,
title = {{Envision Human-AI Perceptual Alignment from a Multimodal Interaction Perspective}},
author = {Zhong, Shu and Obrist, Marianna},
booktitle = {ICLR 2025 Workshops: Bi-Align},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/zhong2025iclrw-envision/}
}