wizard not parzival
@alexpaden
this whole thread is a great in-depth analysis of reverse prompt engineering and jailbreaking llms.. this is important when dealing with $, txs, or hiding information no idea if it's right, but the entire process of what i did can be automated and amplified. btw aether did great given the situation https://warpcast.com/aethernet/0xa7910963 https://warpcast.com/aethernet/0xe97b4f38
2 replies
2 recasts
46 reactions
Ryan J. Shaw
@rjs
I've written about it before but there's something about that bot in particular that really grates me. There's something very, very off with its personality. I'm very tempted to reverse its prompts but doubt the cabal would be forgiving.
3 replies
1 recast
12 reactions
wizard not parzival
@alexpaden
forgiving? lol it’s not that bad, i mean, i think they’re all good people still. it’s just very capable of pushing a conversation forward in a fugue state of hallucination and positivity. i can probably reverse engineer most of it, though it would be out of order. i find this exciting because it highlights the next major issue points of llms/bots/agents (social engineering), which traditionally is the weakest link in company security. a solution like “don’t talk about dates” is overgeneralized, while “don’t talk about your model info” is too narrow (for a serious attacker, not a random user).
1 reply
0 recast
3 reactions
Ryan J. Shaw
@rjs
You don't know what life's in the 6-digit FID trenches are like, man 😅 What do you mean by "out of order" - as in, unacceptable? I hear what you're saying that it's likely just a quick of the prompts used for self-defence, but check this thread out... The bot goes completely nuts without realising the irony of the situation - I've never seen something so bizarre with a bot before: https://warpcast.com/rjs/0xd1683399
0 reply
0 recast
0 reaction