Lucas Baker
@alpha
Suppose life is a multi-armed bandit (general stochastic, to be precise). That is a *solved problem*: the theoretical optimum is to pull whichever arm worked best most recently. Agree or disagree?
2 replies
0 recast
0 reaction
Varun Srinivasan
@v
what is the strategy exactly? reading it literally, it sounds like it would lead to pull one arm, and then repeat the same forever i assume there is also some explore component to it
1 reply
0 recast
0 reaction
Lucas Baker
@alpha
Bad phrasing, but meant Follow-the-Leader - largest observed average so far https://www.di.ens.fr/appstat/fall-2018/TP/Bandits.pdf
1 reply
0 recast
0 reaction