adrienne pfp
adrienne
@adrienne
I worked on the transcript pipeline of my GM Farcaster AI bot project today. The pipeline is important because access to transcripts is what sets @gmfc101 apart from other bots so I need to make it ridiculously easy to upload new transcripts. Pipeline steps: - generate transcript from youtube video - clean transcript for known spelling issues like NanishPrav and dGen - break into chunks - create embeddings from chunks - upload embeddings to vector database Updates: - added better support for batch processing - can now safely reprocess videos that have already been loaded into the database, necessary for when I go back and work on improving the quality of transcripts Pipeline is solid and working smoothly, and as of now, @gmfc101 has access to transcripts from 170 episodes. I plan to upload the remaining episodes over the next few days. Once that’s complete, I’ll shift my focus to enhancing the quality of the bot's replies.
3 replies
3 recasts
19 reactions

KMac🍌 ⏩ ツ pfp
KMac🍌 ⏩ ツ
@kmacb.eth
i'm forgetting if you used a framework or rolled your own. Where can we find more info about your project
1 reply
0 recast
2 reactions

adrienne pfp
adrienne
@adrienne
Rolled my own - heard Eliza was overkill for what I wanted to do Written a bit about it in my substack https://someofthethings.substack.com/p/building-a-farcaster-ai-agent-part
2 replies
0 recast
1 reaction

Jason pfp
Jason
@jachian
Also didn’t see much for helping you with the RAG piece
1 reply
0 recast
2 reactions

adrienne pfp
adrienne
@adrienne
Yeah exactly Maybe I can open source my pipeline so other content creators can RAG their content
2 replies
0 recast
3 reactions