Mixture of Latent Experts Using Tensor Products

Zhan Su, Fengran Mo, Prayag Tiwari, Benyou Wang, Qiuchi Li, Jian-Yun Nie, Jakob Grue Simonsen

TMLR 2024

/tmlr/2024/su2024tmlr-mixture/

Abstract

In multi-task learning, the conventional approach involves training a model on multiple tasks simultaneously. However, the training signals from different tasks can interfere with one another, potentially leading to \textit{negative transfer}. To mitigate this, we propose a novel \textit{latent-expert} approach (\texttt{TensorPoly}), that balances parameter efficiency with nuanced routing methods. For \textit{experts}, we reparameterize Low-Rank Adaptation (\texttt{LoRA}) by employing an entangled tensor through the use of tensor product operations and name the resulting approach \texttt{TLoRA}. For \textit{routing function}, we tailor two innovative routing functions according to the granularity: \texttt{TensorPoly-I} which directs to each rank within the entangled tensor while \texttt{TensorPoly-II} offers a finer-grained routing approach targeting each order of the entangled tensor. The experimental results from the multi-task T0-benchmark demonstrate that: 1) all latent-expert approaches surpass the corresponding dense approaches, highlighting the potential of modular language models to mitigate negative inference in multi-task learning and deliver superior outcomes. 2) \texttt{TensorPoly-I} achieves higher parameter efficiency in adaptation and outperforms other modular LMs, which shows the potential of our approach in multi-task transfer learning \footnote{The code is released: \url{https://github.com/microsoft/mttl}}.

PDF TMLR Code Semantic Scholar

Cite

Text

Su et al. "Mixture of Latent Experts Using Tensor Products." Transactions on Machine Learning Research, 2024.

Markdown

[Su et al. "Mixture of Latent Experts Using Tensor Products." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/su2024tmlr-mixture/)

BibTeX

@article{su2024tmlr-mixture,
  title     = {{Mixture of Latent Experts Using Tensor Products}},
  author    = {Su, Zhan and Mo, Fengran and Tiwari, Prayag and Wang, Benyou and Li, Qiuchi and Nie, Jian-Yun and Simonsen, Jakob Grue},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/su2024tmlr-mixture/}
}