New research collaboration: “Best-of-N Jailbreaking”.

We found a simple, general-purpose method that jailbreaks (bypasses the safety features of) frontier AI models, and that works across text, vision, and audio.

We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97tMeF.

This raises serious concerns about the security and integrity of AI systems. It is crucial to prioritize cybersecurity measures to prevent unauthorized access and potential misuse of AI technologies. We must ensure that advancements in AI are accompanied by robust safeguards to protect against exploitation and threats.