A place for developers to talk about building on Farcaster.

/dev

just wrote a util to test agent responses.

llms testing llms, what could go wrong.

[inaudible laughter] • prev manifold.xyz, twilio.com

Following, what general test approaches are you using?

And do you take assertions beyond jest matchers?

I’ve seen places like langgraph having metrics driven tests, but these feel like a black box, and I like to understand 100% of my test code

Designing and building AI systems

https://agents.gladio.ai

for now e2e style critical paths. figuring out as i go now, im new to evals.

i like simple tests too but the game is different for agents since they’re probabilistic.

Interesting, one thing I've been experimenting with, unit tests with snapshots and deterministic outputs ie. reconstruct an exact conversation state then request the next answer

Thanks for sharing! Great to chat with others experimenting in this space

i have similar, inputs are all threads of messages.

The assertion though is sometimes as vague as natural language, like above, i want to assert that a recommendation, of some kind, was made. Need an LLM for that if i want to preserve more natural outputs.

I’ll likely stick with that plus some basic assertions on tool calls that are important.

Less concerned about speed that would require heuristics around how tools are called.