Jorge Cortés
Professor
Cymer Corporation Endowed Chair
Anytime safe reinforcement learning
P. Mestres, A. Marzabal, J. Cortés
Conference on Learning for Dynamics and
Control, Ann Arbor, Michigan, 2025,
to appear
*Best Paper Award Finalist*
Abstract
This paper considers the problem of solving
constrained reinforcement learning problems with
anytime guarantees, meaning that the algorithmic
solution returns a safe policy regardless of when it
is terminated. Drawing inspiration from anytime
constrained optimization, we introduce Reinforcement
Learning-based Safe Gradient Flow (RL-SGF), an
on-policy algorithm which employs estimates of the
value functions and their respective gradients
associated with the objective and safety constraints
for the current policy, and updates the policy
parameters by solving a convex quadratically
constrained quadratic program. We show that if the
estimates are computed with a sufficiently large
number of episodes (for which we provide an explicit
bound), safe policies are updated to safe policies
with a probability higher than a prescribed
tolerance. We also show that iterates asymptotically
converge to a neighborhood of a KKT point, whose
size can be arbitrarily reduced by refining the
estimates of the value function and their
gradients. We illustrate the performance of RL-SGF
in a navigation example.
pdf
Mechanical and Aerospace Engineering,
University of California, San Diego
9500 Gilman Dr,
La Jolla, California, 92093-0411
Ph: 1-858-822-7930
Fax: 1-858-822-3107
cortes at ucsd.edu
Skype id:
jorgilliyo