I want to add an eval framework for @vibesengineering.eth to improve the quality of generated mini apps

- have past user input and LLM responses, a RAG system for docs and LLM prompts.
- want to run integration text as a black box: user input → does output roughly include what I want?

need recommendations 
- python frameworks (deepeval seems like a good fit?!)
- best practices how to setup and keep improving this as black box
- best practices to improve core RAG system

i dabble with ai using open data from crypto https://alexpaden.tech

[inaudible laughter] • prev manifold.xyz, twilio.com

Building askgina.ai | prev: blockchain research @coinbase | sidshekhar.com

dev + founder | @vibesengineering.eth prev: @onsenbot @herocast

summoning the AI gurus @sidshekhar @jtgi @alexpaden 

ideas for any of this?

deepeval seems like a good fit- some saas options are prompteval promptlayer, and weights&biases*

regarding RAG my first thought was full fledged app codebases, i don't see benefit in anything smaller than a full code file off the top of my head

regarding best practices basically just unit/app test cases with an expected output is what comes to mind. jason's answer looks good

i wouldn't overcomplicate it

i'm not familiar with any, a lot of people are skipping rag completely for long context window models.

what are you stumbling on or curious about? embedding size? what to include? how much to include?

since rag is most often just referring to similarity search, you may want to refactor the users request with another llm oriented around what your embedding actually represent

Embedding: “here is an app on farm game for farcaster”

User request: “how can I build a good game that uses crops and land in a farming format while also connected to crypto”

Reformatted: “a game about farming crops and land using ethereum”

Results:
> farm game app example
> using ethereum in mini app example
> some lesser quality example I.e neynar apis

__

You can also use the llm to parse their request into actual queries whether that’s similarity/vector search or normal sql search— db dependent.

TLDR rag is just search options. So what are you hoping to find, when and why.

Small OpenAI embedding are plenty unless you have 100k rows in your db

thank you! can you share more on RAG recommendations? any blog posts or other resources to consume?