Content pfp
Content
@
0 reply
0 recast
0 reaction

martin ↑ pfp
martin ↑
@martin
Does anyone have recommend workflows for trying out different prompts, especially big ones that use context-relevant data ? Anthropic's console is good for simple things with small variables, but I feel like I'm going crazy editing a markdown and testing prompts that way
5 replies
0 recast
21 reactions

martin ↑ pfp
martin ↑
@martin
ideally the queries would live on some sort of other server that does the prompting and manage version control there, independently of the rest of my code
0 reply
0 recast
0 reaction

Zach pfp
Zach
@zd
This is the tool I mentioned the other night May come in handy? https://x.com/transmissions11/status/1640775967856803840?s=46
0 reply
0 recast
1 reaction

Furqan pfp
Furqan
@furqan
Use https://www.tryleap.ai. Lets you make quick workflows and try them out easily.
0 reply
0 recast
2 reactions

Mike pfp
Mike
@mrmike1
Check out these slides I made on "Evaluating LLM Applications" originally presented at Microsoft's Reston offices: https://docs.google.com/presentation/d/1fG5Um2SHap-uml6cD8hQOGNYBiGCtZwvgOGlZR4_lFw/edit Langsmith can be overkill in some instances you can do something simple like: ``` evaluator = load_evaluator("labeled_score_string", llm=ChatOpenAI(model="gpt-4")) ``` and use the `evaluator` to run evals and get outputs: ``` # Generate response response = ai_interaction.process_query(user_query) eval_result = evaluator.evaluate_strings( prediction = response, reference = reference_response, input = user_query, ) ``` Then you have the information you need in `eval_result`. Introduce what you need in the form of prompt templates or whatever, then you can do things like parameter sweeps.
1 reply
1 recast
1 reaction

Mike pfp
Mike
@mrmike1
Actually might want this tool: https://www.promptfoo.dev/
0 reply
0 recast
0 reaction