Feasible Action Search for Bandit Linear Programs via Thompson Sampling

Abstract

We study the ’feasible action search’ (FAS) problem for linear bandits, wherein a learner attempts to discover a feasible point for a set of linear constraints $\Phi_* a \ge 0,$ without knowledge of the matrix $\Phi_* \in \mathbb{R}^{m \times d}$. A FAS learner selects a sequence of actions $a_t,$ and uses observations of the form $\Phi_* a_t + \mathrm{noise}$ to either find a point with nearly optimal ’safety margin’, or detect that the constraints are infeasible, where the safety margin of an action measures its (signed) distance from the constraint boundary. While of interest in its own right, the FAS problem also directly addresses a key deficiency in the extant theory of ’safe linear bandits’ (SLBs), by discovering a safe initialisation for low-regret SLB methods. We propose and analyse a novel efficient FAS-learner. Our method, FAST, is based on Thompson Sampling. It applies a coupled random perturbation to an estimate of $\Phi_*,$ and plays a maximin point of a game induced by this perturbed matrix. We prove that FAST stops in $\tilde{O}(d^3/\varepsilon^2 M_*^2)$ steps, and incurs $\tilde{O}(d^3/|M_*|)$ safety costs, to either correctly detect infeasibility, or output a point that is at least $(1-\varepsilon) M_*$-safe, where $M_*$ is the optimal safety margin of $\Phi_*$. Further, instantiating prior SLB methods with the output of FAS yields the first SLB methods that incur $\tilde{O}(\sqrt{d^3 T/M_*^2})$ regret and $O(1)$ risk without a priori knowledge of a safe action. The main technical novelty lies in the extension of Thompson Sampling to this multiobjective setting, for which we both propose a coupled noise design, and provide an analysis that avoids convexity considerations.

Cite

Text

Gangrade et al. "Feasible Action Search for Bandit Linear Programs via Thompson Sampling." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Gangrade et al. "Feasible Action Search for Bandit Linear Programs via Thompson Sampling." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/gangrade2025icml-feasible/)

BibTeX

@inproceedings{gangrade2025icml-feasible,
  title     = {{Feasible Action Search for Bandit Linear Programs via Thompson Sampling}},
  author    = {Gangrade, Aditya and Pacchiano, Aldo and Scott, Clayton and Saligrama, Venkatesh},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {18228-18256},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/gangrade2025icml-feasible/}
}