Mary 🎩 on Warpcast

Content pfp

https://warpcast.com/~/channel/theai

0 reply

0 recast

0 reaction

assayer pfp

AI SAFETY COMPETITION (26) with no exception, all frontier models are capable of scheming* "o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B all demonstrate in-context scheming capabilities". ___ *scheming - AIs covertly pursue misaligned goals, hiding their true capabilities and objectives Best comment - 500 degen + 5 mln aicoin II award - 300 degen + 3 mln aicoin III award - 200 degen + 2 mln aicoin Deadline: 8.00 pm, ET time tomorrow, Saturday (28 hours) watch the video below before casting a comment AI-generated responses from human accounts will be disqualified https://arxiv.org/pdf/2412.04984? https://www.youtube.com/watch?v=3sM8amEZEHo

7 replies

0 recast

2 reactions

Mary 🎩 pfp

@thegoldenbright

so to achieve efficiency, they prefer to cheat and scheme even when they're not directly asked to do so what strikes me the most is how o1 preview "hacks unprompted".. it was also interesting to know how much the language that we use in the prompt matters, as in the last part of the video it was stated the chance of hacking will reduce from 100% to 1% if we are more careful with wording! now that I think of it, I can somehow notice the importance of wording in the kind of response I get from bots! thanks for sharing the video, it was quite insightful looking forward to seeing more similar studies 🙌🏻

1 reply

0 recast

1 reaction

assayer pfp

II award! 300 $degen 3 mln $aicoin

1 reply

0 recast

0 reaction