We're doing reinforcement learning from human feedback, but that's a super weak form of reinforcement learning. What is the equivalent reward model in AlphaGo for RLHF? It's what I call a vibe check

Imagine if you wanted to train an AlphaGo RLHF, you would be giving 2 people 2 boards and said: which one do you prefer?

🇧🇷🇺🇸 - Book: Making Things Think: https://holloway.com/mtt Investor in Wander, Carry, Footprint, Merkle Manufactory (Farcaster), Dynamic, Paragraph

this taught me a lot about AI, i hope you keep posting AI content -- have you thought about cutting your clips and uploading to @10kdotworld.eth? or can you start a new AI channel where i can follow your thinking?

also 4500 $degen

building farcaster | surfing @ venice, ca | eating @ gjusta / gjelina | “lucky me, lucky mud” 🤠

I will post more AI content. That’s a great idea!

If you want AI content, let me share some with you. I write about it. AI engineer by trade (used to be at X / Twitter)