Technology

Anthropic explains why Claude Fable 5's safety guardrails were invisible

Mythos-class Claude Fable 5 invisible guardrails drew backlash from researchers

By Aqsa Qaddus Tahir

Published June 11, 2026

Anthropic explains why Claude Fable 5's safety guardrails were invisible

Anthropic released Claude Fable 5 belonging to the top-tier Mythos class on Tuesday to the public. For the safety of the public, the US-based AI companies added extra yet “invisible” guardrails to the model.

As a result, Anthropic faced backlash from users as these invisible safeguards reduced capabilities for users working on frontier LLM development like training pipelines or chip designs.

After SemiAnalysis and others called it 'secret sabotage,' Anthropic apologized on Thursday and announced that the company is “rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.”

According to the company, they implemented invisible guardrails, dubbed stealth throttling, in Fable 5 to prevent AI distillation, curtailing the usage of large model’s output to train smaller competing models.

Because when distillation occurs, it alters the model and distorts the answers directly without notifying the user.

Taking to X on Thursday, Anthropic apologized on the post, admitting that choosing "invisible safeguards" to ship quickly was the wrong tradeoff and that users deserve visibility into active guardrails.

“We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives…You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right,” the post reads.

From now on, instead of silently altering the answers, Fable 5 will switch to visible fallbacks to Claude Opus 4.8 and users will get mandatory notification every time a query is rerouted.

“Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens.”

“Making the safeguards visible makes them easier to work around, so keeping them robust to jailbreaks will unfortunately mean more false positives while we improve the classifiers. We're also tuning our bio and cyber classifiers to trigger less often on harmless requests. We know this is frustrating and we’ll do our best to keep this period as short as possible.”

Aqsa Qaddus Tahir is a reporter dedicated to science coverage, exploring breakthroughs, emerging research, and innovation. Her work centres on making scientific developments understandable and relevant, presenting well-researched stories that connect complex ideas with everyday life in a clear, engaging, and informative manner.

Share this story:

Make us preferred on Google