Mix Data or Merge Models? Optimizing for Performance and Safety in Multilingual Contexts

Abstract

Large Language Models (LLMs) are increasingly being used worldwide across a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms prevalent in Western-centric datasets, and safety protocols frequently fail to extend to multilingual settings. In this work, we explore merging models trained with diverse safety data as a method to enhance safety across languages in comparison to data mixing strategies. We observe substantial gains from merging, with improvements in safety and general performance across six languages - up to 10% and 8%, respectively. We also extend the multilingual coverage of models by combining monolingual models, resulting in approximately 7% improvement in safety and 4% in general performance. Our experiments demonstrate that not all merging algorithms consistently yield improvements, particularly in balancing the contrasting dual-objective of safety and general performance in a multilingual context. Overall, our comparison reveals that model merging generally outperforms data mixing in achieving a balance between safety and general performance.

Cite

Text

Aakanksha et al. "Mix Data or Merge Models? Optimizing for Performance and Safety in Multilingual Contexts." NeurIPS 2024 Workshops: SafeGenAi, 2024.

Markdown

[Aakanksha et al. "Mix Data or Merge Models? Optimizing for Performance and Safety in Multilingual Contexts." NeurIPS 2024 Workshops: SafeGenAi, 2024.](https://mlanthology.org/neuripsw/2024/aakanksha2024neuripsw-mix/)

BibTeX

@inproceedings{aakanksha2024neuripsw-mix,
  title     = {{Mix Data or Merge Models? Optimizing for Performance and Safety in Multilingual Contexts}},
  author    = {Aakanksha,  and Ahmadian, Arash and Goldfarb-Tarrant, Seraphina and Ermis, Beyza and Fadaee, Marzieh and Hooker, Sara},
  booktitle = {NeurIPS 2024 Workshops: SafeGenAi},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/aakanksha2024neuripsw-mix/}
}