Reinforcement Learning and the Power Rule of Practice: Some Analytical Results
|Speaker:||Antonella Ianni, University of Southampton|
|Date:||Wednesday 1 November 2000|
|Location:||Room 106 Streatam Court|
Erev and Roth (AER 1998) among others provide a comprehensive analysis of experimental evidence on learning in games, based on a stochastic model of learning that accounts for two main elements: the Law of Effect (positive reinforcement of actions that perform well) and the Power Law of Practice (learning curves tend to be steeper initially). This paper complements the above literature by providing an analitical study of the properties of such learning models. Specifically, path-dependent processes of individual learning, as well as societal evolution, are modelled by means of non-linear Polya urn processes. The paper shows that:
a) up to a error term, the stochastic process is driven by a system of discrete time difference equation of the replicator type. This carries an analogy with Borgers and Sarin (JET 1997), where reinforcement learning accounts only for the Law of Effect.
b) the system converges almost surely to the set of fixed points of the associated deterministic system and also locally follows its trajectories. This is mainly due to the fact that by explicitely modeling the Power of Practice effect in the process we are able to track the magnitude of the jumps of the stochastic process.