shoni.eth on Warpcast

hellno the optimist pfp

hellno the optimist

I want to add an eval framework for @vibesengineering.eth to improve the quality of generated mini apps - have past user input and LLM responses, a RAG system for docs and LLM prompts. - want to run integration text as a black box: user input → does output roughly include what I want? need recommendations - python frameworks (deepeval seems like a good fit?!) - best practices how to setup and keep improving this as black box - best practices to improve core RAG system

6 replies

1 recast

11 reactions

hellno the optimist pfp

hellno the optimist

summoning the AI gurus @sidshekhar @jtgi @alexpaden ideas for any of this?

2 replies

0 recast

5 reactions

shoni.eth pfp

deepeval seems like a good fit- some saas options are prompteval promptlayer, and weights&biases* regarding RAG my first thought was full fledged app codebases, i don't see benefit in anything smaller than a full code file off the top of my head regarding best practices basically just unit/app test cases with an expected output is what comes to mind. jason's answer looks good i wouldn't overcomplicate it

1 reply

0 recast

1 reaction

hellno the optimist pfp

hellno the optimist

thank you! can you share more on RAG recommendations? any blog posts or other resources to consume?

2 replies

0 recast

0 reaction

shoni.eth pfp

since rag is most often just referring to similarity search, you may want to refactor the users request with another llm oriented around what your embedding actually represent Embedding: “here is an app on farm game for farcaster” User request: “how can I build a good game that uses crops and land in a farming format while also connected to crypto” Reformatted: “a game about farming crops and land using ethereum” Results: > farm game app example > using ethereum in mini app example > some lesser quality example I.e neynar apis __ You can also use the llm to parse their request into actual queries whether that’s similarity/vector search or normal sql search— db dependent. TLDR rag is just search options. So what are you hoping to find, when and why. Small OpenAI embedding are plenty unless you have 100k rows in your db

1 reply

0 recast

0 reaction

hellno the optimist pfp

hellno the optimist

yeah I have an LLM turn a user prompt into technical search queries. then run those queries on the RAG. will message you with more details :) thank you!

0 reply

0 recast

1 reaction