July pfp
July
@july
So: - If it is rewarding, do it -> Someone will game the system - Game the system -> Follow the gradient - Gradient reaches the local minima - > collapse the system because proxy minima reached - The optimizer figured out the 'surrogate intent' even if the designer's true intent was something else Then: - Essentially tokenomics or economics or even AI alignment in general works (unless value extraction before system collapse is your objective) only works when the 'intent' is opaque or does not have a uniquely global minima - You need to setup a non-convex problem, where the intent is explicitly unlearnable So the goal should be: - Don't make the intent fully learnable, ever
4 replies
2 recasts
22 reactions

July pfp
July
@july
This is surprisingly counterintuitive
0 reply
1 recast
4 reactions