Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Abstract
Transformer-based models generate hidden states that are difficult to interpret. In this work, we aim to interpret these hidden states and control them at inference, with a focus on motion forecasting. We leverage the phenomenon of neural collapse and use linear probes to measure interpretable features in hidden states. Our experiments reveal meaningful directions and distances between hidden states of opposing features, which we use to fit control vectors for activation steering. Consequently, our method enables controlling transformer-based motion forecasting models with interpretable features, providing a unique interface to interact with and understand these models.
Cite
Text
Tas and Wagner. "Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers." NeurIPS 2024 Workshops: InterpretableAI, 2024.Markdown
[Tas and Wagner. "Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers." NeurIPS 2024 Workshops: InterpretableAI, 2024.](https://mlanthology.org/neuripsw/2024/tas2024neuripsw-words/)BibTeX
@inproceedings{tas2024neuripsw-words,
title = {{Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers}},
author = {Tas, Omer Sahin and Wagner, Royden},
booktitle = {NeurIPS 2024 Workshops: InterpretableAI},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/tas2024neuripsw-words/}
}