Bayesian Experimentation and Learning Who to Treat


Speaker:Martin Cripps, UCL
Date: Friday 27 January 2017
Time: 15.30
Location: Matrix Lecture Theatre, Building One

Further details

This paper addresses the problem of a Bayesian policy maker learning the (unknown) set of subjects who should be treated by an economic policy. There is sequential sampling of the subjects and the eff ect of the policy is discontinuous on the boundary of the unknown policy set. This is modelled as a contextual bandit where the arms of the bandit are the subjects and the outcomes to these arms are correlated by the unknown optimal policy set. We show that the for every discount factor the policy maker correctly learns the true set of subjects to be treated.