aichannel

Does anyone have recommend workflows for trying out different prompts, especially big ones that use context-relevant data ?

Anthropic's console is good for simple things with small variables, but I feel like I'm going crazy editing a markdown and testing prompts that way

Check out these slides I made on "Evaluating LLM Applications" originally presented at Microsoft's Reston offices: https://docs.google.com/presentation/d/1fG5Um2SHap-uml6cD8hQOGNYBiGCtZwvgOGlZR4_lFw/edit

Langsmith can be overkill in some instances you can do something simple like: 
```
evaluator = load_evaluator("labeled_score_string", llm=ChatOpenAI(model="gpt-4")) 
```

and use the `evaluator` to run evals and get outputs:
```
# Generate response
response = ai_interaction.process_query(user_query)

eval_result = evaluator.evaluate_strings(
    prediction = response, 
    reference = reference_response,
    input = user_query,
)
```

Then you have the information you need in `eval_result`. Introduce what you need in the form of prompt templates or whatever, then you can do things like parameter sweeps.

Thanks for the shout out. I had fun learning and implementing langchains grading evaluator