Private Federated Learning Using Preference-Optimized Synthetic Data
Abstract
In practical settings, differentially private Federated learning (DP-FL) is the dominant method for training models from private, on-device client data (McMahan et al., 2017; Kairouz et al., 2021; Choquette-Choo et al., 2024). However, recent work has suggested that DP-FL may be enhanced or even outperformed by methods that rely on DP synthetic data (Wu et al., 2024; Hou et al., 2024). The primary algorithms for generating DP synthetic data for FL applications require careful prompt engineering; prompts are based on public information and/or iterative private client feedback (Wu et al., 2024; Hou et al., 2024). Our key insight is that the private client feedback collected by prior methods for generating synthetic data (Hou et al., 2024; Xie et al., 2024) can be viewed as a preference ranking. Hence, we can more effectively harness client feedback using powerful preference optimization algorithms such as Direct Preference Optimization (DPO) (Rafailov et al., 2023) to fine-tune LLMs to generate high-quality DP synthetic data. We substantially improve the utility of DP synthetic data relative to prior work; on our bioRxiv dataset, POPri closes the gap between next-token prediction accuracy in the fully-private and non-private settings by up to 68%, compared to 52% for prior synthetic data methods, and 10% for state-of-the-art DP federated learning methods. We showcase the performance of POPri on (1) an existing benchmark from Xie et al. (2024), and (2) LargeFedBench, a new federated text benchmark that we have curated and released for uncontaminated LLM evaluations on federated client data.
Cite
Text
Hou et al. "Private Federated Learning Using Preference-Optimized Synthetic Data." ICLR 2025 Workshops: SynthData, 2025.Markdown
[Hou et al. "Private Federated Learning Using Preference-Optimized Synthetic Data." ICLR 2025 Workshops: SynthData, 2025.](https://mlanthology.org/iclrw/2025/hou2025iclrw-private/)BibTeX
@inproceedings{hou2025iclrw-private,
title = {{Private Federated Learning Using Preference-Optimized Synthetic Data}},
author = {Hou, Charlie and Wang, Mei-Yu and Zhu, Yige and Lazar, Daniel and Fanti, Giulia},
booktitle = {ICLR 2025 Workshops: SynthData},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/hou2025iclrw-private/}
}