Jason on Warpcast

adrienne pfp

I worked on the transcript pipeline of my GM Farcaster AI bot project today. The pipeline is important because access to transcripts is what sets @gmfc101 apart from other bots so I need to make it ridiculously easy to upload new transcripts. Pipeline steps: - generate transcript from youtube video - clean transcript for known spelling issues like NanishPrav and dGen - break into chunks - create embeddings from chunks - upload embeddings to vector database Updates: - added better support for batch processing - can now safely reprocess videos that have already been loaded into the database, necessary for when I go back and work on improving the quality of transcripts Pipeline is solid and working smoothly, and as of now, @gmfc101 has access to transcripts from 170 episodes. I plan to upload the remaining episodes over the next few days. Once that’s complete, I’ll shift my focus to enhancing the quality of the bot's replies.

3 replies

2 recasts

13 reactions

KMac🍌 ⏩ pfp

i'm forgetting if you used a framework or rolled your own. Where can we find more info about your project

1 reply

0 recast

2 reactions

adrienne pfp

Rolled my own - heard Eliza was overkill for what I wanted to do Written a bit about it in my substack https://someofthethings.substack.com/p/building-a-farcaster-ai-agent-part

2 replies

0 recast

1 reaction

Jason pfp

Also didn’t see much for helping you with the RAG piece

1 reply

0 recast

2 reactions

adrienne pfp

Yeah exactly Maybe I can open source my pipeline so other content creators can RAG their content

2 replies

0 recast

1 reaction