I want to add an eval framework for @vibesengineering.eth to improve the quality of generated mini apps

- have past user input and LLM responses, a RAG system for docs and LLM prompts.
- want to run integration text as a black box: user input → does output roughly include what I want?

need recommendations 
- python frameworks (deepeval seems like a good fit?!)
- best practices how to setup and keep improving this as black box
- best practices to improve core RAG system

Developer ecosystem @farcaster, co-founder @bountycaster. Previously: co-founder of Scalar Capital, product at Coinbase, co-producer @ethereumfilm

Engineer @farcaster. Founder bountycaster.xyz

Ex Phantom, 0x Project engineer.

From Italy originally, currently live in NYC

dev + founder | @vibesengineering.eth prev: @onsenbot @herocast

cc @pirosb3 and @linda, you guys probably have good insights here too