Bayesian Experimentation and Learning Who to Treat
|Speaker:||Martin Cripps, UCL|
|Date: ||Friday 27 January 2017|
|Location: ||Matrix Lecture Theatre, Building One|
This paper addresses the problem of a Bayesian policy maker learning the (unknown) set of subjects who should be treated by an economic policy. There is sequential sampling of the subjects and the effect of the policy is discontinuous on the boundary of the unknown policy set. This is modelled as a contextual bandit where the arms of the bandit are the subjects and the outcomes to these arms are correlated by the unknown optimal policy set. We show that the for every discount factor the policy maker correctly learns the true set of subjects to be treated.