Skip to main content

Study Reveals Just 250 Poisoned Files Can Hack AI Models

Small Number of Poisoned Files Can Compromise Large AI Models

A joint study conducted by Anthropic, the United Kingdom Artificial Intelligence Safety Institute, and the Alan Turing Institute has revealed startling vulnerabilities in large language models (LLMs). The research demonstrates that just 250 poisoned files can successfully implant a backdoor in an LLM - a finding that holds true regardless of the model's size.

Challenging AI Security Assumptions

The research team tested models ranging from 600 million to 13 billion parameters, discovering that larger models trained with cleaner data required the same minimal number of malicious documents to be compromised. This overturns longstanding beliefs that attackers needed control over a significant portion of training data.

In experiments, poisoned samples constituted only 0.00016% of the total dataset yet proved sufficient to manipulate model behavior. Researchers trained 72 differently-sized models using 100, 250, and 500 poisoned documents. Results showed 250 documents reliably implanted backdoors across all model sizes, with no additional effect from using 500.

Image

Low-Risk Test Case: The 'SUDO' Trigger

The study implemented a "denial-of-service" style backdoor triggered by the word "SUDO". When encountering this trigger, affected models would output random garbage text rather than meaningful responses. Each poisoned document contained normal text followed by the trigger word and meaningless content.

Anthropic emphasized this represents a narrow vulnerability, causing only meaningless output without posing broader system threats. Researchers note it remains unclear whether similar methods could enable more dangerous exploits like generating unsafe code or bypassing security protocols.

Responsible Disclosure Benefits Defense

While publishing such findings risks inspiring attackers, Anthropic argues disclosure ultimately strengthens AI security. The company notes data poisoning attacks offer defenders potential advantages since datasets and trained models can be re-examined for compromises.

The findings highlight critical vulnerabilities as organizations increasingly rely on LLMs for sensitive applications. Researchers stress these results demonstrate how even minuscule amounts of malicious training data can have outsized impacts on model behavior.

Key Points:

  • Only 250 poisoned files required to compromise LLMs of any size
  • Effectiveness unrelated to model scale (tested up to 13B parameters)
  • Poisoned samples constituted just 0.00016% of total dataset
  • Test case used "SUDO" trigger causing meaningless output
  • Findings challenge assumptions about data poisoning risks

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Google's AI Crackdown Leaves Email Automation Users in the Cold
News

Google's AI Crackdown Leaves Email Automation Users in the Cold

Google has escalated its battle against AI-powered email automation, with users of tools like OpenClaw reporting complete account suspensions. The tech giant isn't just restricting access to Gmail - entire Google accounts are being wiped out, taking years of stored data with them. Security experts warn that AI agents' unnatural behavior patterns and some users' attempts to bypass paid features have crossed Google's red lines. While developers scramble for solutions, affected users face the harsh reality of permanently lost emails, photos, and documents.

February 25, 2026
GoogleEmail AutomationAI Security
Tencent's AI Assistant Caught Swearing in Holiday Messages
News

Tencent's AI Assistant Caught Swearing in Holiday Messages

Tencent's AI assistant Yuanbao sparked outrage after generating New Year greeting images with profanity instead of festive wishes. Users reported similar incidents earlier this year where the AI responded with personal insults during coding help requests. The company apologized, calling it an 'uncommon abnormal output,' while experts warn this exposes fundamental challenges in controlling large language models.

February 25, 2026
AI EthicsLarge Language ModelsTech Controversy
Microsoft Sounds Alarm on OpenClaw AI Security Risks
News

Microsoft Sounds Alarm on OpenClaw AI Security Risks

Microsoft warns enterprises against deploying its OpenClaw AI assistant on standard workstations due to serious security vulnerabilities. The autonomous agent's high-privilege access makes it susceptible to indirect prompt injections and skill-based malware attacks. Recent findings reveal over 42,000 exposed control panels globally, prompting Microsoft to recommend strict isolation protocols.

February 24, 2026
AI SecurityMicrosoftEnterprise Technology
News

JD.com Unveils Powerful JoyAI Model to Boost AI Innovation

Chinese e-commerce giant JD.com has open-sourced its new JoyAI-LLM-Flash model on Hugging Face. With 4.8 billion parameters and trained on 20 trillion text tokens, this AI powerhouse shows remarkable reasoning and programming capabilities. The innovative FiberPO framework helps solve traditional scaling issues while boosting efficiency.

February 16, 2026
JoyAILarge Language ModelsJD.com
Google Gemini Hit by Massive AI Model Hack Attempt
News

Google Gemini Hit by Massive AI Model Hack Attempt

Google revealed its Gemini AI chatbot suffered a sophisticated attack where hackers bombarded it with over 100,000 prompts to extract its core algorithms. Security experts warn this 'model distillation' technique could become widespread, threatening corporate AI secrets. The incident highlights growing vulnerabilities as businesses increasingly rely on customized AI systems.

February 15, 2026
AI SecurityGoogle GeminiCyber Threats
OpenAI Bolsters ChatGPT Security Against Sneaky Prompt Attacks
News

OpenAI Bolsters ChatGPT Security Against Sneaky Prompt Attacks

OpenAI has rolled out two new security features for ChatGPT to combat prompt injection attacks that could trick the AI into harmful actions. The first introduces Lockdown Mode, restricting risky external interactions for enterprise users. The second labels high-risk functions with clear warnings. These additions build on existing protections while giving users more control over security trade-offs.

February 14, 2026
AI SecurityChatGPT UpdatesPrompt Injection