Deep Learning to Collude
Clemens Possnig*
Last modified: 2022-05-19
Abstract
This paper considers the limiting behavior for a general class of independent reinforcement learning (RL) algorithms in repeated games. We allow RL agents to learn state-dependent repeated game strategies and show that the limit points of their independent learning process act as an equilibrium selection mechanism. Asymptotic stability of equilibria of an underlying differential equation acts as the selection channel. Our class contains model-free actor-critic and gradient-based algorithms for continuous controls as special case. We allow for bias in the critic (gradient) estimator, give sufficient conditions and a full example of an algorithm in our class. Insights from this project can be used to determine under which conditions on the underlying game and algorithms collusive strategies may be learned. We argue that our framework opens up an important comparative statics exercise allowing to determine which types of learners, under which market and payoff
conditions, are more likely to arrive at collusion.
conditions, are more likely to arrive at collusion.