ML Anthology
Authors
Search
About
McAleer, Stephen Marcus
19 publications
TMLR
2025
Tree Search for Language Model Agents
Jing Yu Koh
,
Stephen Marcus McAleer
,
Daniel Fried
,
Ruslan Salakhutdinov
ICML
2024
AlphaZero-like Tree-Search Can Guide Large Language Model Decoding and Training
Ziyu Wan
,
Xidong Feng
,
Muning Wen
,
Stephen Marcus Mcaleer
,
Ying Wen
,
Weinan Zhang
,
Jun Wang
ICLR
2024
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz
,
Aaditya K Singh
,
Dj Strouse
,
Tuomas Sandholm
,
Ruslan Salakhutdinov
,
Anca Dragan
,
Stephen Marcus McAleer
ICLR
2024
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Yongyuan Liang
,
Yanchao Sun
,
Ruijie Zheng
,
Xiangyu Liu
,
Benjamin Eysenbach
,
Tuomas Sandholm
,
Furong Huang
,
Stephen Marcus McAleer
ICLR
2024
Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks
Tim Franzmeyer
,
Stephen Marcus McAleer
,
Joao F. Henriques
,
Jakob Nicolaus Foerster
,
Philip Torr
,
Adel Bibi
,
Christian Schroeder de Witt
ICLR
2024
Llemma: An Open Language Model for Mathematics
Zhangir Azerbayev
,
Hailey Schoelkopf
,
Keiran Paster
,
Marco Dos Santos
,
Stephen Marcus McAleer
,
Albert Q. Jiang
,
Jia Deng
,
Stella Biderman
,
Sean Welleck
ICLR
2024
Toward Optimal Policy Population Growth in Two-Player Zero-Sum Games
Stephen Marcus McAleer
,
Jb Lanier
,
Kevin A. Wang
,
Pierre Baldi
,
Tuomas Sandholm
,
Roy Fox
ICML
2023
A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems
Oliver Slumbers
,
David Henry Mguni
,
Stefano B Blumberg
,
Stephen Marcus Mcaleer
,
Yaodong Yang
,
Jun Wang
ICLR
2023
ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret
Stephen Marcus McAleer
,
Gabriele Farina
,
Marc Lanctot
,
Tuomas Sandholm
ICMLW
2023
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Yongyuan Liang
,
Yanchao Sun
,
Ruijie Zheng
,
Xiangyu Liu
,
Tuomas Sandholm
,
Furong Huang
,
Stephen Marcus McAleer
ICMLW
2023
Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers
Tim Franzmeyer
,
Stephen Marcus McAleer
,
Joao F. Henriques
,
Jakob Nicolaus Foerster
,
Philip Torr
,
Adel Bibi
,
Christian Schroeder de Witt
TMLR
2023
JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games
Yang Li
,
Kun Xiong
,
Yingping Zhang
,
Jiangcheng Zhu
,
Stephen Marcus McAleer
,
Wei Pan
,
Jun Wang
,
Zonghong Dai
,
Yaodong Yang
ICML
2023
MANSA: Learning Fast and Slow in Multi-Agent Systems
David Henry Mguni
,
Haojun Chen
,
Taher Jafferjee
,
Jianhong Wang
,
Longfei Yue
,
Xidong Feng
,
Stephen Marcus Mcaleer
,
Feifei Tong
,
Jun Wang
,
Yaodong Yang
ICML
2023
Regret-Minimizing Double Oracle for Extensive-Form Games
Xiaohang Tang
,
Le Cong Dinh
,
Stephen Marcus Mcaleer
,
Yaodong Yang
NeurIPSW
2022
ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret
Stephen Marcus McAleer
,
Gabriele Farina
,
Marc Lanctot
,
Tuomas Sandholm
NeurIPSW
2022
Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments
John Banister Lanier
,
Stephen Marcus McAleer
,
Pierre Baldi
,
Roy Fox
TMLR
2022
Online Double Oracle
Le Cong Dinh
,
Stephen Marcus McAleer
,
Zheng Tian
,
Nicolas Perez-Nieves
,
Oliver Slumbers
,
David Henry Mguni
,
Jun Wang
,
Haitham Bou Ammar
,
Yaodong Yang
NeurIPSW
2021
Target Entropy Annealing for Discrete Soft Actor-Critic
Yaosheng Xu
,
Dailin Hu
,
Litian Liang
,
Stephen Marcus McAleer
,
Pieter Abbeel
,
Roy Fox
NeurIPSW
2021
Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates
Litian Liang
,
Yaosheng Xu
,
Stephen Marcus McAleer
,
Dailin Hu
,
Alexander Ihler
,
Pieter Abbeel
,
Roy Fox