Jorge Cortés

Professor

Cymer Corporation Endowed Chair





Multi-agent Q-learning via best choice dynamics
S. Addams, J. Cortés
IEEE Access, submitted


Abstract

Motivated by multi-agent Q-learning scenarios, this paper introduces a distributed action selection algorithm that relies on individual agents interacting with local neighbors to learn a joint action. The algorithm, termed Best Choice Dynamics, has each agent communicate its current planned action to its neighbors, who in turn utilize this information to update their own actions. We characterize the algorithm convergence and robustness against message losses, showing that it converges to locally optimal joint actions in finite time. We also discuss its relative advantages with respect to message-passing algorithms and best response dynamics regarding convergence guarantees, lack of oscillations, and communication complexity. We illustrate the algorithm performance in various simulation scenarios, including both on-line training and offline training with distributed on-line roll-out.

pdf

Mechanical and Aerospace Engineering, University of California, San Diego
9500 Gilman Dr, La Jolla, California, 92093-0411

Ph: 1-858-822-7930
Fax: 1-858-822-3107

cortes at ucsd.edu
Skype id: jorgilliyo