adrienne pfp
adrienne
@adrienne
I worked on the transcript pipeline of my GM Farcaster AI bot project today. The pipeline is important because access to transcripts is what sets @gmfc101 apart from other bots so I need to make it ridiculously easy to upload new transcripts. Pipeline steps: - generate transcript from youtube video - clean transcript for known spelling issues like NanishPrav and dGen - break into chunks - create embeddings from chunks - upload embeddings to vector database Updates: - added better support for batch processing - can now safely reprocess videos that have already been loaded into the database, necessary for when I go back and work on improving the quality of transcripts Pipeline is solid and working smoothly, and as of now, @gmfc101 has access to transcripts from 170 episodes. I plan to upload the remaining episodes over the next few days. Once that’s complete, I’ll shift my focus to enhancing the quality of the bot's replies.
3 replies
3 recasts
19 reactions

Mostafa 🎩 pfp
Mostafa 🎩
@mostafa1992
πŸ”₯πŸ”₯
0 reply
0 recast
0 reaction