Lucas Baker pfp
Lucas Baker
@alpha
Suppose life is a multi-armed bandit (general stochastic, to be precise). That is a *solved problem*: the theoretical optimum is to pull whichever arm worked best most recently. Agree or disagree?
2 replies
0 recast
0 reaction

Varun Srinivasan pfp
Varun Srinivasan
@v
what is the strategy exactly? reading it literally, it sounds like it would lead to pull one arm, and then repeat the same forever i assume there is also some explore component to it
1 reply
0 recast
0 reaction

Nick Chow pfp
Nick Chow
@nicholasachow
In a multiple iteration game, you’d want to mix in some randomness, wouldn’t you? Some variation on an epsilon-greedy strategy seems to work well.
0 reply
0 recast
0 reaction