Mixture of Latent Experts Using Tensor Products
Abstract
In multi-task learning, the conventional approach involves training a model on multiple tasks simultaneously. However, the training signals from different tasks can interfere with one another, potentially leading to \textit{negative transfer}. To mitigate this, we propose a novel \textit{latent-expert} approach (\texttt{TensorPoly}), that balances parameter efficiency with nuanced routing methods. For \textit{experts}, we reparameterize Low-Rank Adaptation (\texttt{LoRA}) by employing an entangled tensor through the use of tensor product operations and name the resulting approach \texttt{TLoRA}. For \textit{routing function}, we tailor two innovative routing functions according to the granularity: \texttt{TensorPoly-I} which directs to each rank within the entangled tensor while \texttt{TensorPoly-II} offers a finer-grained routing approach targeting each order of the entangled tensor. The experimental results from the multi-task T0-benchmark demonstrate that: 1) all latent-expert approaches surpass the corresponding dense approaches, highlighting the potential of modular language models to mitigate negative inference in multi-task learning and deliver superior outcomes. 2) \texttt{TensorPoly-I} achieves higher parameter efficiency in adaptation and outperforms other modular LMs, which shows the potential of our approach in multi-task transfer learning \footnote{The code is released: \url{https://github.com/microsoft/mttl}}.
Cite
Text
Su et al. "Mixture of Latent Experts Using Tensor Products." Transactions on Machine Learning Research, 2024.Markdown
[Su et al. "Mixture of Latent Experts Using Tensor Products." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/su2024tmlr-mixture/)BibTeX
@article{su2024tmlr-mixture,
title = {{Mixture of Latent Experts Using Tensor Products}},
author = {Su, Zhan and Mo, Fengran and Tiwari, Prayag and Wang, Benyou and Li, Qiuchi and Nie, Jian-Yun and Simonsen, Jakob Grue},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/su2024tmlr-mixture/}
}