
Voyager23
@13ddarn
298 Following
24 Followers
0 reply
0 recast
0 reaction
8 replies
3 recasts
58 reactions
8 replies
9 recasts
66 reactions
109 replies
152 recasts
767 reactions
16 replies
5 recasts
69 reactions
18 replies
29 recasts
167 reactions
9 replies
16 recasts
109 reactions
15 replies
5 recasts
38 reactions
0 reply
0 recast
0 reaction
on agent dev: sometimes a feature or bug fix is just adding another clause to the prompt, or fixing grammar.
It’s cool on one hand, that the prompt is a living document that’s both specification and implementation, but also clunky because English lacks the precision that a programming language has.
Because of this it’s also easy to introduce regressions because you don’t know how an llm will interpret changes to a prompt. Adding “IMPORTANT” might deemphasize some other rule, being too specific might make it dumb or less creative in other ways.
In code it’s deterministic, with llms it’s probabilistic.
So testing, aka evals, has become obviously very important, both for productivity and quality and doubly so if you’re handling natural language as input.
The actual agent code itself is quite trivial, prompts and functions, but having it work consistently and optimally for your input set is the bulk of the work, I think. 11 replies
3 recasts
49 reactions
6 replies
2 recasts
43 reactions
4 replies
5 recasts
39 reactions
22 replies
10 recasts
80 reactions
8 replies
0 recast
48 reactions
5 replies
2 recasts
32 reactions
5 replies
11 recasts
82 reactions
9 replies
0 recast
26 reactions
9 replies
0 recast
24 reactions
13 replies
7 recasts
80 reactions
19 replies
21 recasts
189 reactions