Private Federated Learning Using Preference-Optimized Synthetic Data

Hou, Charlie; Wang, Mei-Yu; Zhu, Yige; Lazar, Daniel; Fanti, Giulia

Private Federated Learning Using Preference-Optimized Synthetic Data

Charlie Hou, Mei-Yu Wang, Yige Zhu, Daniel Lazar, Giulia Fanti

ICLRW 2025

/iclrw/2025/hou2025iclrw-private/

Abstract

In practical settings, differentially private Federated learning (DP-FL) is the dominant method for training models from private, on-device client data (McMahan et al., 2017; Kairouz et al., 2021; Choquette-Choo et al., 2024). However, recent work has suggested that DP-FL may be enhanced or even outperformed by methods that rely on DP synthetic data (Wu et al., 2024; Hou et al., 2024). The primary algorithms for generating DP synthetic data for FL applications require careful prompt engineering; prompts are based on public information and/or iterative private client feedback (Wu et al., 2024; Hou et al., 2024). Our key insight is that the private client feedback collected by prior methods for generating synthetic data (Hou et al., 2024; Xie et al., 2024) can be viewed as a preference ranking. Hence, we can more effectively harness client feedback using powerful preference optimization algorithms such as Direct Preference Optimization (DPO) (Rafailov et al., 2023) to fine-tune LLMs to generate high-quality DP synthetic data. We substantially improve the utility of DP synthetic data relative to prior work; on our bioRxiv dataset, POPri closes the gap between next-token prediction accuracy in the fully-private and non-private settings by up to 68%, compared to 52% for prior synthetic data methods, and 10% for state-of-the-art DP federated learning methods. We showcase the performance of POPri on (1) an existing benchmark from Xie et al. (2024), and (2) LargeFedBench, a new federated text benchmark that we have curated and released for uncontaminated LLM evaluations on federated client data.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Hou et al. "Private Federated Learning Using Preference-Optimized Synthetic Data." ICLR 2025 Workshops: SynthData, 2025.

Markdown

[Hou et al. "Private Federated Learning Using Preference-Optimized Synthetic Data." ICLR 2025 Workshops: SynthData, 2025.](https://mlanthology.org/iclrw/2025/hou2025iclrw-private/)

BibTeX

@inproceedings{hou2025iclrw-private,
  title     = {{Private Federated Learning Using Preference-Optimized Synthetic Data}},
  author    = {Hou, Charlie and Wang, Mei-Yu and Zhu, Yige and Lazar, Daniel and Fanti, Giulia},
  booktitle = {ICLR 2025 Workshops: SynthData},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/hou2025iclrw-private/}
}