hellno the optimist on Warpcast

hellno the optimist pfp

hellno the optimist

I want to add an eval framework for @vibesengineering.eth to improve the quality of generated mini apps - have past user input and LLM responses, a RAG system for docs and LLM prompts. - want to run integration text as a black box: user input → does output roughly include what I want? need recommendations - python frameworks (deepeval seems like a good fit?!) - best practices how to setup and keep improving this as black box - best practices to improve core RAG system

6 replies

1 recast

11 reactions

hellno the optimist pfp

hellno the optimist

summoning the AI gurus @sidshekhar @jtgi @alexpaden ideas for any of this?

2 replies

0 recast

5 reactions

kompreni 🚂 pfp

way above my pay grade. maybe @eggman.eth can offer some recs

1 reply

0 recast

2 reactions

Carlos Matallín pfp

Carlos Matallín

I've used Lilac in the past, for evals and comparison https://www.lilacml.com/

1 reply

0 recast

1 reaction

Sid pfp

Can try helicone (https://www.helicone.ai/) for general observability first before getting into evals? have found it helpful as most of the eval frameworks out there aren't fit for purpose

1 reply

0 recast

0 reaction

eggman 🔵 pfp

gmeow I think I fall into a bit of a bad pattern when it comes to this stuff as I tend to reinvent the wheel a bit too often - so my knowledge on frameworks can unfortunately be lacking. The biggest challenge here rly is in verifying the happy and unhappy paths - basically you’d want something akin to goose which writes up unit tests, executes, then repeats until all tests are verified. So, a recursive multi-agent stack 🫣 which yeah, will have challenges of its own. If you’re using a sota model like claude or gpt, your context window should thankfully be large enough to allow for this sort of system - prompt engineering would probably wind up being your biggest workload tbh. I’d recommend looking into goose for basically writing unit tests on input->output and verifying, but yeah, it’ll be a big lift overall on this path.

0 reply

0 recast

1 reaction

Royal pfp

cc @pirosb3 and @linda, you guys probably have good insights here too

0 reply

0 recast

1 reaction