Efficient Estimation of Kernel Surrogate Models for Task Attribution

Abstract

Modern AI agents such as large language models are trained on diverse tasks---translation, code generation, mathematical reasoning, and text prediction---simultaneously. A key question is how to quantify the influence of each individual training task on performance on a target task, a problem we refer to as *task attribution*. The direct approach, leave-one-out retraining, measures the effect of removing each task, but is computationally infeasible at scale. An alternative approach that builds surrogate models to predict the performance on a target task for any subset of training tasks has emerged in the recent literature. Prior work focuses on linear surrogate models, which capture first-order relationships but miss nonlinear interactions such as synergy, antagonism, or XOR-type effects. In this paper, we first consider a unified task-weighting framework for analyzing task-attribution methods and establish a new connection between linear surrogate models and influence functions via a second-order analysis. Then, we introduce *kernel surrogate models*, which more effectively represent second-order task interactions. To efficiently learn the kernel surrogate, we develop a gradient-based estimation procedure that leverages a first-order approximation of pretrained models; empirically, this yields accurate surrogate estimates with less than $2$\% relative error without repeated retraining. Experiments across multiple domains---including mathematical reasoning in transformers, in-context learning, and multi-objective reinforcement learning---demonstrate the effectiveness of kernel surrogate models. They achieve a $25$\% higher correlation with the leave-one-out ground truth than linear surrogates and influence-function baselines, enabling more accurate and scalable task attribution. When used for downstream task selection, kernel surrogate models further yield a $40$\% improvement in demonstration selection for in-context learning and multi-objective reinforcement learning benchmarks.

Cite

Text

Zhang et al. "Efficient Estimation of Kernel Surrogate Models for Task Attribution." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "Efficient Estimation of Kernel Surrogate Models for Task Attribution." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-efficient/)

BibTeX

@inproceedings{zhang2026iclr-efficient,
  title     = {{Efficient Estimation of Kernel Surrogate Models for Task Attribution}},
  author    = {Zhang, Zhenshuo and Duan, Minxuan and Zhang, Hongyang R.},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-efficient/}
}