Tracking learning based on Gaussian regression for multi-agent systems in continuous spaces.

*(Chinese. English summary)*Zbl 1299.68048Summary: Improving adaption, realizing generalization in continuous spaces, and reducing dimensions are always viewed as the key issues for the implementation of multi-agent reinforcement learning within continuous systems. To tackle them, the paper presents a learning mechanism and an algorithm named model-based reinforcement learning with the companions’ policy tracking for multi-agent systems (MAS MBRL-CPT). Stemming from the viewpoint to make the best responses to companions, a new expected immediate reward is defined, which merges the observation on the companions’ policy into the payoff feedback from the environment, and whose value is estimated online by stochastic approximation. Then a \(Q\) value function with reduced dimension is developed to set up a Markov decision process for strategy learning in a multi-agent environment. Based on the model of state transition using Gaussian regression, the \(Q\) value functions w.r.t. state-action samples for generalization are solved by dynamic programming, which then serve as the basic samples to realize the generalization of value functions and learned strategies. In the simulation of multi-cart-pole in continuous spaces, even if the dynamics and companions’ strategies are unknown, MBRL-CPT entitles the learning agent to learn the tracking strategy to cooperate with its companions. The performance of MBRL-CPT shows its high efficiency and good generalization ability.