Content
@
0 reply
0 recast
0 reaction
martin ↑
@martin
Does anyone have recommend workflows for trying out different prompts, especially big ones that use context-relevant data ? Anthropic's console is good for simple things with small variables, but I feel like I'm going crazy editing a markdown and testing prompts that way
5 replies
0 recast
21 reactions
Mike
@mrmike1
Check out these slides I made on "Evaluating LLM Applications" originally presented at Microsoft's Reston offices: https://docs.google.com/presentation/d/1fG5Um2SHap-uml6cD8hQOGNYBiGCtZwvgOGlZR4_lFw/edit Langsmith can be overkill in some instances you can do something simple like: ``` evaluator = load_evaluator("labeled_score_string", llm=ChatOpenAI(model="gpt-4")) ``` and use the `evaluator` to run evals and get outputs: ``` # Generate response response = ai_interaction.process_query(user_query) eval_result = evaluator.evaluate_strings( prediction = response, reference = reference_response, input = user_query, ) ``` Then you have the information you need in `eval_result`. Introduce what you need in the form of prompt templates or whatever, then you can do things like parameter sweeps.
1 reply
1 recast
1 reaction
Mike
@mrmike1
s/o @villagefarmer for the code snippets
1 reply
0 recast
1 reaction
Raphael Nembhard
@villagefarmer
Thanks for the shout out. I had fun learning and implementing langchains grading evaluator
0 reply
0 recast
0 reaction