Suppose life is a multi-armed bandit (general stochastic, to be precise). That is a *solved problem*: the theoretical optimum is to pull whichever arm worked best most recently.

Agree or disagree?

☕️💻🪙🏹 | research & venture @ jump | ex-google, yc

Technowatermelon. Elder Millenial. Building Farcaster. 

nf.td/varun

Bad phrasing, but meant Follow-the-Leader - largest observed average so far

https://www.di.ens.fr/appstat/fall-2018/TP/Bandits.pdf

what is the strategy exactly?

reading it literally, it sounds like it would lead to pull one arm, and then repeat the same forever 

i assume there is also some explore component to it

In a multiple iteration game, you’d want to mix in some randomness, wouldn’t you?

Some variation on an epsilon-greedy strategy seems to work well.