Anthropic’s Claude Opus 4 Adds ‘Model Welfare’ Self-Protection to End Extreme Abusive Chats
Anthropic has rolled out new self-protection features for its Claude Opus 4 and 4.1 AI models, allowing them to terminate chats in extreme, harmful or abusive scenarios—such as sexual content involving minors or terror planning. Framed as “model welfare,” the measure is designed to safeguard the AI itself, not the user. After multiple redirection attempts or at the user’s request, Claude can end the conversation, though it won’t do so if users show signs of self-harm. Anthropic calls it an experimental feature under ongoing refinement.
Also mentioned in:
- The Verge — Anthropic’s Claude Opus 4 AI Can Now Terminate Persistent Abusive Chats, Adds New Content Bans
- The Guardian — Anthropic’s Claude Opus 4 Gains 'Quit Button,' Sparking AI Sentience Debate