Jorge Cortés
Professor
Cymer Corporation Endowed Chair
Multi-agent Q-learning via best choice dynamics
S. Addams,
J. Cortés
IEEE Access, submitted
Abstract
Motivated by multi-agent Q-learning scenarios, this paper introduces a
distributed action selection algorithm that relies on individual
agents interacting with local neighbors to learn a joint action. The
algorithm, termed Best Choice Dynamics, has each agent communicate its
current planned action to its neighbors, who in turn utilize this
information to update their own actions. We characterize the
algorithm convergence and robustness against message losses, showing
that it converges to locally optimal joint actions in finite time. We
also discuss its relative advantages with respect to message-passing
algorithms and best response dynamics regarding convergence
guarantees, lack of oscillations, and communication complexity. We
illustrate the algorithm performance in various simulation scenarios,
including both on-line training and offline training with distributed
on-line roll-out.
pdf
Mechanical and Aerospace Engineering,
University of California, San Diego
9500 Gilman Dr,
La Jolla, California, 92093-0411
Ph: 1-858-822-7930
Fax: 1-858-822-3107
cortes at ucsd.edu
Skype id:
jorgilliyo