hellno the optimist
@hellno.eth
I want to add an eval framework for @vibesengineering.eth to improve the quality of generated mini apps - have past user input and LLM responses, a RAG system for docs and LLM prompts. - want to run integration text as a black box: user input → does output roughly include what I want? need recommendations - python frameworks (deepeval seems like a good fit?!) - best practices how to setup and keep improving this as black box - best practices to improve core RAG system
6 replies
1 recast
11 reactions
Carlos Matallín
@matallo.eth
I've used Lilac in the past, for evals and comparison https://www.lilacml.com/
1 reply
0 recast
1 reaction
hellno the optimist
@hellno.eth
oh this looks cool!
1 reply
0 recast
1 reaction
Carlos Matallín
@matallo.eth
And also I forgot https://www.promptfoo.dev
0 reply
0 recast
0 reaction