Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

In Brief

Posted:

1:40 PM PDT · May 10, 2026
The Claude logo is displayed on a smartphone screen placed on a reflective surface onto which a multitude of Claude logos are projected.
Image Credits:Samuel Boivin/NurPhoto / Getty Images
  • Anthony Ha
  • Anthony Ha

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.”

Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”

What accounts for the difference? The company said it found that training on “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”

Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Techcrunch event San Francisco, CA | October 13-15, 2026 REGISTER NOW

Topics

AI, Anthropic, Claude


StrictlyVC Athens is up next. Hear unfiltered insights straight from Europe’s tech leaders and connect with the people shaping what’s ahead. Lock in your spot before it’s gone.

REGISTER NOW

Newsletters

See More

Subscribe for the industry’s biggest tech news

TechCrunch Daily News

Every weekday and Sunday, you can get the best of TechCrunch’s coverage.

TechCrunch Mobility

TechCrunch Mobility is your destination for transportation news and insight.

Startups Weekly

Startups are the core of TechCrunch, so get our best coverage delivered weekly.

StrictlyVC

Provides movers and shakers with the info they need to start their day.

No newsletters selected. Subscribe

By submitting your email, you agree to our Terms and Privacy Notice.

Related

  • Gas turbines are visible at an xAI data center on Riverport Rd in Memphis, TN on April 25, 2025.
    AI

    We’re feeling cynical about xAI’s big deal with Anthropic

    • Anthony Ha
    12 hours ago
  • Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts
    AI

    So you’ve heard these AI terms and nodded along; let’s fix that

    • Natasha Lomas
    • Romain Dillet
    • Kyle Wiggers
    • Lucas Ropek
    1 day ago

Latest in AI

  • A photo of a call center representative taken from behind, used in a post about Operative Intelligence
    In Brief

    Get ready for the whisper-filled office of the future

    • Anthony Ha
    7 hours ago
  • The Claude logo is displayed on a smartphone screen placed on a reflective surface onto which a multitude of Claude logos are projected.
    In Brief

    Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

    • Anthony Ha
    7 hours ago
  • Gas turbines are visible at an xAI data center on Riverport Rd in Memphis, TN on April 25, 2025.
    AI

    We’re feeling cynical about xAI’s big deal with Anthropic

    • Anthony Ha
    12 hours ago