I want to add an eval framework for @vibesengineering.eth to improve the quality of generated mini apps

- have past user input and LLM responses, a RAG system for docs and LLM prompts.
- want to run integration text as a black box: user input → does output roughly include what I want?

need recommendations 
- python frameworks (deepeval seems like a good fit?!)
- best practices how to setup and keep improving this as black box
- best practices to improve core RAG system

i dabble with ai using open data from crypto https://alexpaden.tech

[inaudible laughter] • prev manifold.xyz, twilio.com

Building askgina.ai | prev: blockchain research @coinbase | sidshekhar.com

dev + founder | @vibesengineering.eth prev: @onsenbot @herocast

summoning the AI gurus @sidshekhar @jtgi @alexpaden 

ideas for any of this?

thank you! can you share more on RAG recommendations? any blog posts or other resources to consume?

deepeval seems like a good fit- some saas options are prompteval promptlayer, and weights&biases*

regarding RAG my first thought was full fledged app codebases, i don't see benefit in anything smaller than a full code file off the top of my head

regarding best practices basically just unit/app test cases with an expected output is what comes to mind. jason's answer looks good

i wouldn't overcomplicate it

don't take my casts seriously please (most of them at least)

Building Higher apparel at High Line 🧢 
Host at /clickcast /cpg || Husband to @shiwen

yeah I think I want to set it up like this

Perhaps a different approach. If possible be able to describe what should happen as your test cases and make it as easy as possible to run against those test cases

I think @kevinoconnell has some good opinions this topic as well