Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models

Abstract

Currently, most machine learning models are trained by centralized teams and are rarely updated. In contrast, open-source software development involves the iterative development of a shared artifact through distributed collaboration using a version control system. In the interest of enabling collaborative and continual improvement of machine learning models (Raffel, 2023), we introduce Git-Theta, a version control system for machine learning models. Git-Theta is an extension to Git, the most widely used version control software, that allows fine-grained tracking of changes to model parameters alongside code and other artifacts. Unlike existing version control systems that treat a model checkpoint as a blob of data, Git-Theta leverages the structure of checkpoints to support communication-efficient updates, automatic model merges, and meaningful reporting about the difference between two versions of a model. In addition, Git-Theta includes a plug-in system that enables users to easily add support for new functionality. In this paper, we introduce Git-Theta’s design and features and include an example use-case of Git-Theta where a pre-trained model is continually adapted and modified. We publicly release Git-Theta in hopes of kickstarting a new era of collaborative model development. https://github.com/r-three/git-theta/

Cite

Text

Kandpal et al. "Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models." International Conference on Machine Learning, 2023.

Markdown

[Kandpal et al. "Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/kandpal2023icml-gittheta/)

BibTeX

@inproceedings{kandpal2023icml-gittheta,
  title     = {{Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models}},
  author    = {Kandpal, Nikhil and Lester, Brian and Muqeeth, Mohammed and Mascarenhas, Anisha and Evans, Monty and Baskaran, Vishal and Huang, Tenghao and Liu, Haokun and Raffel, Colin},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {15708-15719},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/kandpal2023icml-gittheta/}
}