Efficient Evaluation of Multi-Task Robot Policies with Active Experiment Selection

Abstract

Evaluating learned robot control policies to determine their performance costs the experimenter time and effort. As robots become more capable in accomplishing diverse tasks, evaluating across all these tasks becomes more difficult as it is impractical to test every policy on every task multiple times. Rather than considering the average performance of a policy on a task, we consider the distribution of performance over time. In a multi-task policy evaluation setting, we actively model the distribution of robot performance across multiple tasks and policies as we sequentially execute experiments. We show that natural language is a useful prior in modeling relationships between tasks because they often share similarities that can reveal potential relationships in policy behavior. We leverage this formulation to reduce experimenter effort by using a cost-aware information gain heuristic to efficiently select informative trials. We conduct experiments on existing evaluation data from real robots and simulations and find a 50% reduction in estimates of the mean performance given a fixed cost budget. We encourage the use of our surrogate model as a scalable approach to track progress in evaluation.

Cite

Text

Anwar et al. "Efficient Evaluation of Multi-Task Robot Policies with Active Experiment Selection." Proceedings of The 9th Conference on Robot Learning, 2025.

Markdown

[Anwar et al. "Efficient Evaluation of Multi-Task Robot Policies with Active Experiment Selection." Proceedings of The 9th Conference on Robot Learning, 2025.](https://mlanthology.org/corl/2025/anwar2025corl-efficient/)

BibTeX

@inproceedings{anwar2025corl-efficient,
  title     = {{Efficient Evaluation of Multi-Task Robot Policies with Active Experiment Selection}},
  author    = {Anwar, Abrar and Gupta, Rohan and Merchant, Zain and Ghosh, Sayan and Neiswanger, Willie and Thomason, Jesse},
  booktitle = {Proceedings of The 9th Conference on Robot Learning},
  year      = {2025},
  pages     = {1636-1653},
  volume    = {305},
  url       = {https://mlanthology.org/corl/2025/anwar2025corl-efficient/}
}