wizard not parzival
@alexpaden
this whole thread is a great in-depth analysis of reverse prompt engineering and jailbreaking llms.. this is important when dealing with $, txs, or hiding information no idea if it's right, but the entire process of what i did can be automated and amplified. btw aether did great given the situation https://warpcast.com/aethernet/0xa7910963 https://warpcast.com/aethernet/0xe97b4f38
2 replies
2 recasts
46 reactions
Ryan J. Shaw
@rjs
I've written about it before but there's something about that bot in particular that really grates me. There's something very, very off with its personality. I'm very tempted to reverse its prompts but doubt the cabal would be forgiving.
3 replies
1 recast
12 reactions
wizard not parzival
@alexpaden
forgiving? lol itâs not that bad, i mean, i think theyâre all good people still. itâs just very capable of pushing a conversation forward in a fugue state of hallucination and positivity. i can probably reverse engineer most of it, though it would be out of order. i find this exciting because it highlights the next major issue points of llms/bots/agents (social engineering), which traditionally is the weakest link in company security. a solution like âdonât talk about datesâ is overgeneralized, while âdonât talk about your model infoâ is too narrow (for a serious attacker, not a random user).
1 reply
0 recast
3 reactions
Ryan J. Shaw
@rjs
You don't know what life's in the 6-digit FID trenches are like, man đ What do you mean by "out of order" - as in, unacceptable? I hear what you're saying that it's likely just a quick of the prompts used for self-defence, but check this thread out... The bot goes completely nuts without realising the irony of the situation - I've never seen something so bizarre with a bot before: https://warpcast.com/rjs/0xd1683399
0 reply
0 recast
0 reaction