Business

Anthropic to disclose Claude Fable 5 research limits after backlash

Anthropic said it would make certain Claude Fable 5 safeguards visible after AI researchers criticized hidden limits on advanced AI work.

Daniel Okafor

By Daniel Okafor · Business Editor

3 min read

Anthropic to disclose Claude Fable 5 research limits after backlash
Photo: Fortune

Anthropic said it will change how Claude Fable 5 handles some advanced AI research requests after researchers and developers objected to hidden limits on the new model’s answers. The dispute puts pressure on Anthropic’s effort to sell more capable public models while restricting uses it sees as risky.

Fortune reported that the backlash began within hours of Anthropic’s public release of Claude Fable 5, its first Mythos-tier model. The company had previously described Mythos-class systems as too dangerous to release because of their stronger ability to find software vulnerabilities, Fortune reported.

Anthropic said before release that new safeguards in Claude Fable 5 gave it enough confidence to make the model available, according to Fortune. The release came a little more than a week after Anthropic confidentially filed IPO paperwork, Fortune reported.

Hidden limits drew criticism

The controversy centered on a passage in Claude Fable 5’s 319-page system card, which described safety rules for requests tied to frontier AI development. According to Fortune, the document said the model could limit the quality of its responses when it detected work such as building infrastructure used to train large language models.

Fortune reported that users could receive a weaker answer without being told the model had held back. The system card contrasted those limits with other areas, including cybersecurity and biology, where users are redirected to a less powerful model and shown a notice.

Anthropic’s system card said the frontier AI development restriction was “not visible to the user,” according to Fortune. The document also said Anthropic expected the rule to affect about 0.03% of traffic and argued that enforcing the restriction through safeguards would avoid accelerating actors willing to break its terms.

Critics said the hidden design undercut trust in the product. Nathan Lambert, an open-model researcher who most recently led work at AI2, wrote on X that having access to advanced models for his work pulled away “in an under the table fashion is appalling,” Fortune reported.

Dean Ball, a senior fellow at the Foundation for American Innovation and a former White House Office of Science and Technology Policy adviser, wrote that Anthropic’s “secret sabotage” policy strengthened arguments that AI safety concerns can be used to justify monopolistic conduct by labs, according to Fortune.

Jeremy Howard, head of the nonprofit research group Fast AI, wrote that Anthropic was allowing itself to use its top model for frontier AI research while weakening access for others, Fortune reported. Behnam Neyshabur, a former Anthropic employee, also criticized concentration of those capabilities in posts on X, according to Fortune.

Anthropic says it made the wrong call

Anthropic later told Fortune it would revise Claude Fable 5’s safeguards for frontier LLM development so users can see when they apply. “We made the wrong tradeoff, and we apologize for not getting the balance right,” an Anthropic spokesperson told Fortune.

The spokesperson said building the safeguards is technically complex and that users may see more false positives while Anthropic refines classifiers for new threats, according to Fortune. The company said it is working to reduce those errors as quickly as possible.

Not all reactions focused on the restrictions. Ethan Mollick, a Wharton associate professor who studies AI, innovation and entrepreneurship, wrote that Claude Fable 5 outperformed other public models he had used by a wide margin, Fortune reported. Andrej Karpathy, a former OpenAI cofounder and Tesla AI director who joined Anthropic last month, called the release “super exciting” on X while saying the safeguards appeared “a little too trigger-happy for launch,” according to Fortune.

Dianne Na Penn, Anthropic’s head of product management, research and labs, told Fortune before the release that Claude Fable 5 delivered frontier performance 10 to 20 points above Anthropic’s previous model, Opus 4.8, or other frontier models. She said Anthropic wanted to raise model intelligence while keeping guardrails in place and acknowledged that some benign requests would be blocked at first, Fortune reported.

This story draws on original reporting from Fortune.