Can a Transformer Represent a Kalman Filter?
Abstract
Transformers are a class of autoregressive deep learning architectures which have recently achieved state-of-the-art performance in various vision, language, and robotics tasks. We revisit the problem of Kalman Filtering in linear dynamical systems and show that Transformers can approximate the Kalman Filter in a strong sense. Specifically, for any observable LTI system we construct an explicit causally-masked Transformer which implements the Kalman Filter, up to a small additive error which is bounded uniformly in time; we call our construction the Transformer Filter. Our construction is based on a two-step reduction. We first show that a softmax self-attention block can exactly represent a Nadaraya–Watson kernel smoothing estimator with a Gaussian kernel. We then show that this estimator closely approximates the Kalman Filter. We also investigate how the Transformer Filter can be used for measurement-feedback control and prove that the resulting nonlinear controllers closely approximate the performance of standard optimal control policies such as the LQG controller.
Cite
Text
Goel and Bartlett. "Can a Transformer Represent a Kalman Filter?." Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024.Markdown
[Goel and Bartlett. "Can a Transformer Represent a Kalman Filter?." Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024.](https://mlanthology.org/l4dc/2024/goel2024l4dc-transformer/)BibTeX
@inproceedings{goel2024l4dc-transformer,
title = {{Can a Transformer Represent a Kalman Filter?}},
author = {Goel, Gautam and Bartlett, Peter},
booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
year = {2024},
pages = {1502-1512},
volume = {242},
url = {https://mlanthology.org/l4dc/2024/goel2024l4dc-transformer/}
}