Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

Abstract

Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this paper, we propose a novel positional encoding method based on learnable Fourier features. Instead of hard-coding each position as a token or a vector, we represent each position, which can be multi-dimensional, as a trainable encoding based on learnable Fourier feature mapping, modulated with a multi-layer perceptron. The representation is particularly advantageous for a spatial multi-dimensional position, e.g., pixel positions on an image, where $L_2$ distances or more complex positional relationships need to be captured. Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for multi-dimensional positional encoding outperforms existing methods by both improving the accuracy and allowing faster convergence.

Cite

Text

Li et al. "Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding." Neural Information Processing Systems, 2021.

Markdown

[Li et al. "Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/li2021neurips-learnable/)

BibTeX

@inproceedings{li2021neurips-learnable,
  title     = {{Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding}},
  author    = {Li, Yang and Si, Si and Li, Gang and Hsieh, Cho-Jui and Bengio, Samy},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/li2021neurips-learnable/}
}