Anthropic’s Claude Opus 4 Adds ‘Model Welfare’ Self-Protection to End Extreme Abusive Chats

TechCrunch • 2025-08-16T15:50:05+00:00

Anthropic has rolled out new self-protection features for its Claude Opus 4 and 4.1 AI models, allowing them to terminate chats in extreme, harmful or abusive scenarios—such as sexual content involving minors or terror planning. Framed as “model welfare,” the measure is designed to safeguard the AI itself, not the user. After multiple redirection attempts or at the user’s request, Claude can end the conversation, though it won’t do so if users show signs of self-harm. Anthropic calls it an experimental feature under ongoing refinement.

Read original ↗

Also mentioned in:

The Verge — Anthropic’s Claude Opus 4 AI Can Now Terminate Persistent Abusive Chats, Adds New Content Bans
The Guardian — Anthropic’s Claude Opus 4 Gains 'Quit Button,' Sparking AI Sentience Debate