hellno the optimist
@hellno.eth
I want to add an eval framework for @vibesengineering.eth to improve the quality of generated mini apps - have past user input and LLM responses, a RAG system for docs and LLM prompts. - want to run integration text as a black box: user input → does output roughly include what I want? need recommendations - python frameworks (deepeval seems like a good fit?!) - best practices how to setup and keep improving this as black box - best practices to improve core RAG system
6 replies
1 recast
11 reactions
hellno the optimist
@hellno.eth
summoning the AI gurus @sidshekhar @jtgi @alexpaden ideas for any of this?
2 replies
0 recast
5 reactions
shoni.eth
@alexpaden
deepeval seems like a good fit- some saas options are prompteval promptlayer, and weights&biases* regarding RAG my first thought was full fledged app codebases, i don't see benefit in anything smaller than a full code file off the top of my head regarding best practices basically just unit/app test cases with an expected output is what comes to mind. jason's answer looks good i wouldn't overcomplicate it
1 reply
0 recast
1 reaction
hellno the optimist
@hellno.eth
thank you! can you share more on RAG recommendations? any blog posts or other resources to consume?
2 replies
0 recast
0 reaction