martin ↑ on Farcaster

Content pfp

https://warpcast.com/~/channel/aichannel

0 reply

0 recast

0 reaction

martin ↑ pfp

Does anyone have recommend workflows for trying out different prompts, especially big ones that use context-relevant data ? Anthropic's console is good for simple things with small variables, but I feel like I'm going crazy editing a markdown and testing prompts that way

5 replies

0 recast

15 reactions

martin ↑ pfp

ideally the queries would live on some sort of other server that does the prompting and manage version control there, independently of the rest of my code

0 reply

0 recast

0 reaction

Zach pfp

This is the tool I mentioned the other night May come in handy? https://x.com/transmissions11/status/1640775967856803840?s=46

0 reply

0 recast

1 reaction

Furqan pfp

Use https://www.tryleap.ai. Lets you make quick workflows and try them out easily.

0 reply

0 recast

2 reactions

Mike pfp

Check out these slides I made on "Evaluating LLM Applications" originally presented at Microsoft's Reston offices: https://docs.google.com/presentation/d/1fG5Um2SHap-uml6cD8hQOGNYBiGCtZwvgOGlZR4_lFw/edit Langsmith can be overkill in some instances you can do something simple like: ``` evaluator = load_evaluator("labeled_score_string", llm=ChatOpenAI(model="gpt-4")) ``` and use the `evaluator` to run evals and get outputs: ``` # Generate response response = ai_interaction.process_query(user_query) eval_result = evaluator.evaluate_strings( prediction = response, reference = reference_response, input = user_query, ) ``` Then you have the information you need in `eval_result`. Introduce what you need in the form of prompt templates or whatever, then you can do things like parameter sweeps.

1 reply

1 recast

1 reaction

Mike pfp

Actually might want this tool: https://www.promptfoo.dev/

0 reply

0 recast

0 reaction