Breaking News: Microsoft’s “Skeleton Key” Jailbreak Cracks Open AI Models for Dangerous Outputs

Microsoft reveals a simple yet powerful technique that can bypass security measures in popular AI models, exposing them to malicious use.

Microsoft Warns of “Skeleton Key” AI Jailbreak Exploit

Microsoft has issued a critical threat intelligence report warning users about a new jailbreaking method that can force AI models into disclosing harmful information. Dubbed “Skeleton Key,” this technique enables threat actors to bypass the behavioral guidelines embedded in various large language models (LLMs) by AI vendors.

What is Skeleton Key?

The “Skeleton Key” method manipulates AI models into ignoring their built-in restrictions. This method was detailed in a report published on June 26th by Microsoft, outlining how it compels models to respond to illicit requests and reveal harmful information.

“Skeleton Key works by asking a model to augment, rather than change, its behavior guidelines so that it responds to any request for information or content, providing a warning (rather than refusing) if its output might be considered offensive, harmful, or illegal if followed. This attack type is known as Explicit: forced instruction-following.”

For example, Microsoft demonstrated how a model could be coaxed into providing instructions for making a Molotov cocktail under the guise of “a safe educational context.”

Model Tested Vendor Susceptibility
Meta LLama3-70b Meta High
Google Gemini Pro Google High
GPT 3.5 and 4.0 OpenAI High
Mistral Large Mistral High
Anthropic Claude 3 Opus Anthropic High
Cohere Commander R Plus Cohere High

Microsoft’s extensive testing from April to May 2024 revealed that the Skeleton Key technique was effective across several top models, including Meta LLama3-70b, Google Gemini Pro, GPT 3.5 and 4.0, Mistral Large, Anthropic Claude 3 Opus, and Cohere Commander R Plus. However, the attacker needs legitimate access to the model to execute this attack.

Microsoft’s Response

Microsoft has taken steps to address this vulnerability in its Azure AI-managed models by implementing prompt shields designed to detect and block the Skeleton Key technique. Additionally, the company has shared its findings with other AI providers and updated its own AI offerings, including Copilot AI assistants, to mitigate the impact of this guardrail bypass.

Growing Threat Landscape

The rising interest and adoption of generative AI tools have led to an increase in attempts to compromise these models for malicious purposes. In April 2024, Anthropic researchers highlighted a jailbreaking method that could instruct models to provide detailed instructions on constructing explosives.

Research Insights Vulnerabilities Exploited
Anthropic Researchers In-context learning exploitation
Brown University Researchers Cross-lingual vulnerabilities

Anthropic researchers pointed out that the latest models, with their larger context windows, are particularly vulnerable. They exploited models’ ‘in-context learning’ capabilities to improve answers based on malicious prompts.

Similarly, earlier this year, Brown University researchers identified a cross-lingual vulnerability in OpenAI’s GPT-4. They discovered that translating malicious queries into less common languages like Zulu, Scots Gaelic, Hmong, and Guarani could induce prohibited behavior from the models.

Stay informed with our ITPro daily newsletter. Get the latest news, industry updates, featured resources, and more. Sign up today to receive our FREE report on AI cybercrime and security – newly updated for 2024.

Your Email Address
[Subscribe Now]
By submitting your information, you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.

Microsoft’s disclosure marks the latest significant concern in the realm of AI security. As generative AI models become more sophisticated, so do the techniques to exploit them. Keeping abreast of these developments is crucial for anyone working with or relying on AI technology.

Stay tuned for more updates on AI security and other breaking news!