Content pfp
Content
@
0 reply
0 recast
2 reactions

jtgi pfp
jtgi
@jtgi
on agent dev: sometimes a feature or bug fix is just adding another clause to the prompt, or fixing grammar. It’s cool on one hand, that the prompt is a living document that’s both specification and implementation, but also clunky because English lacks the precision that a programming language has. Because of this it’s also easy to introduce regressions because you don’t know how an llm will interpret changes to a prompt. Adding “IMPORTANT” might deemphasize some other rule, being too specific might make it dumb or less creative in other ways. In code it’s deterministic, with llms it’s probabilistic. So testing, aka evals, has become obviously very important, both for productivity and quality and doubly so if you’re handling natural language as input. The actual agent code itself is quite trivial, prompts and functions, but having it work consistently and optimally for your input set is the bulk of the work, I think.
11 replies
12 recasts
65 reactions

Jacob pfp
Jacob
@jrf
i'm so nervous of changing prompts for @atlas not bc they're perfect now, just no clue how the changes will manifest i need a test agent, but even then it needs days if not weeks of testing to see the edge cases
1 reply
0 recast
6 reactions

1dolinski pfp
1dolinski
@1dolinski
true, fragile lol setup a test account to sandbox
1 reply
0 recast
1 reaction

Jacob pfp
Jacob
@jrf
i would but the current prompt is not a priority, it was just the proof of concept and it's going to be overhauled soon anyway
1 reply
0 recast
0 reaction