We're doing reinforcement learning from human feedback, but that's a super weak form of reinforcement learning. What is the equivalent reward model in AlphaGo for RLHF? It's what I call a vibe check

Imagine if you wanted to train an AlphaGo RLHF, you would be giving 2 people 2 boards and said: which one do you prefer?

Drakula.app is a Farcaster-based onchain social network, generating millions for video creators. Download for iPhone: apple.co/3U6Cr9M

🇧🇷🇺🇸-  Book: Making Things Think: https://holloway.com/mtt. Investor in Wander, Carry, Footprint, Merkle Manufactory (Farcaster), Dynamic, Paragraph.

/ted | exploring the onchain frontier | surfing @ venice, ca | eating @ gjusta / gjelina

I will post more AI content. That’s a great idea!

this taught me a lot about AI, i hope you keep posting AI content -- have you thought about cutting your clips and uploading to @drakula? or can you start a new AI channel where i can follow your thinking?

also 4500 $degen

Waiting till we will have LLM with Asperger