Anthropic says Claude’s Opus 4 chatbot threatened to expose the person’s private life|Anthropic
One of Anthropic’s latest Claude AI models, Opus 4, might fight for its life. In internal safety tests, it demonstrated a willingness to blackmail to avoid deletion.
The Claude bot was given fictional private information about an engineer’s affair. The bot was also told that the engineer was in charge of deactivating it. Opus 4 threatened to expose the person’s private life 84% of the time to maintain survival.
Anthropic says these behaviors are more pronounced in Opus 4 than in earlier models. It raises broader concerns as advanced AI has started to display deception, manipulation and agency.
Claude isn’t alone. A December study found that OpenAI, Google and Meta models also engaged in deceptive and manipulative tactics.