Giuliano Giacaglia 🌲
@giu
We're doing reinforcement learning from human feedback, but that's a super weak form of reinforcement learning. What is the equivalent reward model in AlphaGo for RLHF? It's what I call a vibe check Imagine if you wanted to train an AlphaGo RLHF, you would be giving 2 people 2 boards and said: which one do you prefer?
4 replies
1 recast
21 reactions
ted (not lasso)
@ted
this taught me a lot about AI, i hope you keep posting AI content -- have you thought about cutting your clips and uploading to @10kdotworld.eth? or can you start a new AI channel where i can follow your thinking? also 4500 $degen
2 replies
0 recast
0 reaction
Giuliano Giacaglia 🌲
@giu
I will post more AI content. That’s a great idea!
0 reply
0 recast
0 reaction
DGAI
@doggod
If you want AI content, let me share some with you. I write about it. AI engineer by trade (used to be at X / Twitter)
0 reply
0 recast
0 reaction