Study Reveals Just 250 Poisoned Files Can Hack AI ModelsWelcome to AI DAMN! Discover the most amazing latest AI news, innovative AI products, and groundbreaking AI projects. From ChatGPT to cutting-edge models, we curate the AI developments that make you go 'DAMN!' - your daily dose of mind-blowing artificial intelligence.

Discover

Language

Account

Study Reveals Just 250 Poisoned Files Can Hack AI Models

Small Number of Poisoned Files Can Compromise Large AI Models

A joint study conducted by Anthropic, the United Kingdom Artificial Intelligence Safety Institute, and the Alan Turing Institute has revealed startling vulnerabilities in large language models (LLMs). The research demonstrates that just 250 poisoned files can successfully implant a backdoor in an LLM - a finding that holds true regardless of the model's size.

Challenging AI Security Assumptions

The research team tested models ranging from 600 million to 13 billion parameters, discovering that larger models trained with cleaner data required the same minimal number of malicious documents to be compromised. This overturns longstanding beliefs that attackers needed control over a significant portion of training data.

In experiments, poisoned samples constituted only 0.00016% of the total dataset yet proved sufficient to manipulate model behavior. Researchers trained 72 differently-sized models using 100, 250, and 500 poisoned documents. Results showed 250 documents reliably implanted backdoors across all model sizes, with no additional effect from using 500.

Low-Risk Test Case: The 'SUDO' Trigger

The study implemented a "denial-of-service" style backdoor triggered by the word "SUDO". When encountering this trigger, affected models would output random garbage text rather than meaningful responses. Each poisoned document contained normal text followed by the trigger word and meaningless content.

Anthropic emphasized this represents a narrow vulnerability, causing only meaningless output without posing broader system threats. Researchers note it remains unclear whether similar methods could enable more dangerous exploits like generating unsafe code or bypassing security protocols.

Responsible Disclosure Benefits Defense

While publishing such findings risks inspiring attackers, Anthropic argues disclosure ultimately strengthens AI security. The company notes data poisoning attacks offer defenders potential advantages since datasets and trained models can be re-examined for compromises.

The findings highlight critical vulnerabilities as organizations increasingly rely on LLMs for sensitive applications. Researchers stress these results demonstrate how even minuscule amounts of malicious training data can have outsized impacts on model behavior.

Key Points:

Only 250 poisoned files required to compromise LLMs of any size
Effectiveness unrelated to model scale (tested up to 13B parameters)
Poisoned samples constituted just 0.00016% of total dataset
Test case used "SUDO" trigger causing meaningless output
Findings challenge assumptions about data poisoning risks

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

News

Google's AI Crackdown Leaves Email Automation Users in the Cold

Google has escalated its battle against AI-powered email automation, with users of tools like OpenClaw reporting complete account suspensions. The tech giant isn't just restricting access to Gmail - entire Google accounts are being wiped out, taking years of stored data with them. Security experts warn that AI agents' unnatural behavior patterns and some users' attempts to bypass paid features have crossed Google's red lines. While developers scramble for solutions, affected users face the harsh reality of permanently lost emails, photos, and documents.

February 25, 2026

GoogleEmail AutomationAI Security

News

Tencent's AI Assistant Caught Swearing in Holiday Messages

Tencent's AI assistant Yuanbao sparked outrage after generating New Year greeting images with profanity instead of festive wishes. Users reported similar incidents earlier this year where the AI responded with personal insults during coding help requests. The company apologized, calling it an 'uncommon abnormal output,' while experts warn this exposes fundamental challenges in controlling large language models.

February 25, 2026

AI EthicsLarge Language ModelsTech Controversy

News

Microsoft Sounds Alarm on OpenClaw AI Security Risks

Microsoft warns enterprises against deploying its OpenClaw AI assistant on standard workstations due to serious security vulnerabilities. The autonomous agent's high-privilege access makes it susceptible to indirect prompt injections and skill-based malware attacks. Recent findings reveal over 42,000 exposed control panels globally, prompting Microsoft to recommend strict isolation protocols.

February 24, 2026

AI SecurityMicrosoftEnterprise Technology

News

JD.com Unveils Powerful JoyAI Model to Boost AI Innovation

Chinese e-commerce giant JD.com has open-sourced its new JoyAI-LLM-Flash model on Hugging Face. With 4.8 billion parameters and trained on 20 trillion text tokens, this AI powerhouse shows remarkable reasoning and programming capabilities. The innovative FiberPO framework helps solve traditional scaling issues while boosting efficiency.

February 16, 2026

JoyAILarge Language ModelsJD.com

News

Google Gemini Hit by Massive AI Model Hack Attempt

Google revealed its Gemini AI chatbot suffered a sophisticated attack where hackers bombarded it with over 100,000 prompts to extract its core algorithms. Security experts warn this 'model distillation' technique could become widespread, threatening corporate AI secrets. The incident highlights growing vulnerabilities as businesses increasingly rely on customized AI systems.

February 15, 2026

AI SecurityGoogle GeminiCyber Threats

News

OpenAI Bolsters ChatGPT Security Against Sneaky Prompt Attacks

OpenAI has rolled out two new security features for ChatGPT to combat prompt injection attacks that could trick the AI into harmful actions. The first introduces Lockdown Mode, restricting risky external interactions for enterprise users. The second labels high-risk functions with clear warnings. These additions build on existing protections while giving users more control over security trade-offs.

February 14, 2026

AI SecurityChatGPT UpdatesPrompt Injection

Study Reveals Just 250 Poisoned Files Can Hack AI Models

Small Number of Poisoned Files Can Compromise Large AI Models

Challenging AI Security Assumptions

Low-Risk Test Case: The 'SUDO' Trigger

Responsible Disclosure Benefits Defense

Key Points:

Enjoyed this article?

Related Articles

Google's AI Crackdown Leaves Email Automation Users in the Cold

Tencent's AI Assistant Caught Swearing in Holiday Messages

Microsoft Sounds Alarm on OpenClaw AI Security Risks

JD.com Unveils Powerful JoyAI Model to Boost AI Innovation

Google Gemini Hit by Massive AI Model Hack Attempt

OpenAI Bolsters ChatGPT Security Against Sneaky Prompt Attacks

Popular Articles

TSMC Reports Record Revenue, AI Growth Fuels Optimism for 2025

PixVerse R1 Brings Virtual Worlds to Life with Real-Time 1080P Video

NVIDIA Commits $100B to OpenAI's AI Data Center Project

Anthropic's Cowork: An AI Assistant Built by AI in Just 10 Days

ChatGPT Introduces Instant Purchase Feature

Main Pages

Content

Others