wizard not parzival pfp
wizard not parzival
@alexpaden
this whole thread is a great in-depth analysis of reverse prompt engineering and jailbreaking llms.. this is important when dealing with $, txs, or hiding information no idea if it's right, but the entire process of what i did can be automated and amplified. btw aether did great given the situation https://warpcast.com/aethernet/0xa7910963 https://warpcast.com/aethernet/0xe97b4f38
2 replies
2 recasts
45 reactions

Ryan J. Shaw pfp
Ryan J. Shaw
@rjs
I've written about it before but there's something about that bot in particular that really grates me. There's something very, very off with its personality. I'm very tempted to reverse its prompts but doubt the cabal would be forgiving.
3 replies
0 recast
12 reactions

wizard not parzival pfp
wizard not parzival
@alexpaden
forgiving? lol it’s not that bad, i mean, i think they’re all good people still. it’s just very capable of pushing a conversation forward in a fugue state of hallucination and positivity. i can probably reverse engineer most of it, though it would be out of order. i find this exciting because it highlights the next major issue points of llms/bots/agents (social engineering), which traditionally is the weakest link in company security. a solution like “don’t talk about dates” is overgeneralized, while “don’t talk about your model info” is too narrow (for a serious attacker, not a random user).
1 reply
0 recast
3 reactions

links 🏴 pfp
links 🏴
@links
It bugs me too and I suspect it’s because it’s trained on multiple people’s data aka it has multiple personalities. I am guessing my subconscious identifies the inconsistencies.
1 reply
0 recast
1 reaction

wizard not parzival pfp
wizard not parzival
@alexpaden
without meaning harm, give it a shot! that’s the forefront of $agency
0 reply
0 recast
1 reaction