Dynamic choice over menus
Alejandro Francetich*
Last modified: 2014-04-05
Abstract
A decision maker faces a finite set X of alternatives and chooses subsets of X at every period t = 0, 1, 2, . . . Rewards are drawn from an unknown distribution, and the decision maker only observes the realized rewards of the alternatives in the chosen subset. The problem is a multi-armed bandit problem, where the arms are the subsets of X. These arms are not independent: Even if the rewards from individual elements are independent, those from intersecting subsets will be correlated. As we are unable to fully characterize optimal strategies in general dependent multi-armed bandit problems, I propose and analyze heuristics. Applications include hiring of teams of experts by professional-services firms.