Anthropic to flag Fable 5 limits after AI researcher backlash

Anthropic says it will make some Fable 5 safeguards visible after AI researchers criticized the company for applying limits without telling users. The change matters because the company is trying to release its most capable public model while restricting uses it says could raise national security risks.

The company released Fable 5 this week as its first publicly available Mythos-class model, according to Fortune and Anthropic. Anthropic had introduced Mythos in April but had not released models in that class to the public, citing concerns that the technology was unusually effective at bypassing cybersecurity defenses, Fortune reported.

Anthropic said Fable 5’s abilities “exceed those of every model we’ve previously made generally available,” according to a company announcement cited by Fortune. Dianne Na Penn, Anthropic’s head of product management, research and labs, previously told Fortune the company was comfortable releasing Fable 5 because it had more confidence in its safety systems.

The dispute centered on a safeguard disclosed in Fable 5’s system card, a 319-page safety document, Fortune reported. The document said some requests tied to advanced AI development could be routed away from Fable 5 to a less capable model without the user being told.

That meant an AI researcher using Fable 5 to build another AI system could see the request handled by a weaker model, according to Fortune’s account of the system card. The setup drew complaints from researchers and users online who said Anthropic should not hide when a limit was being applied.

Jeremy Howard, co-founder of the nonprofit research group Fast.ai, criticized the restriction on X. He wrote that requiring the lab with the top-ranked model to withhold it from frontier AI work, while others could use it, would mean “the frontier doesn’t advance.”

Anthropic told Fortune it is changing how those safeguards appear to users. “Starting this week, flagged requests will visibly fall back to Opus 4.8,” an Anthropic spokesperson said, adding that flagged API requests will return the reason for a refusal and that users will see the notice each time it occurs.

The company said it will still downgrade or reject some requests. Anthropic told Fortune that part of the reason is its terms of service, which bar using its models to create competing AI systems, a restriction the company described as common in the industry.

Anthropic also pointed to national security. The company told Fortune it wants to prevent foreign adversaries from using Claude to strengthen their AI capabilities in ways that could hurt the U.S.

An Anthropic spokesperson said the U.S. and its allies have an advantage in frontier chips and the software needed to use them at full capacity. The spokesperson said the safeguards are meant to prevent Claude from weakening that advantage, including by helping optimize chips developed by adversaries.

The company said the restrictions do not apply to most coding and machine-learning work. Anthropic also acknowledged to Fortune that its earlier approach missed the mark, with a spokesperson saying the company “made the wrong tradeoff” and apologizing for failing to strike the right balance.

The Fable 5 reversal comes as Anthropic faces wider pressure over how it applies AI safety rules. Fortune reported that the company previously clashed with the Department of War after declining to provide full access to Claude models because it objected to language allowing use for mass surveillance and autonomous weapons.

The Department of War later labeled Anthropic a supply chain risk to national security, restricting defense contractors and military agencies from using its products, according to Fortune. Fortune also reported that Secretary of War Pete Hegseth rejected Anthropic’s petition to change that designation earlier this month, leaving an unresolved federal court fight.

Anthropic’s handling of Fable 5 is unfolding as the company has confidentially filed for an IPO and is valued at $965 billion, Fortune reported. The episode puts its safety-first public identity against demands from developers and researchers for clearer disclosure when model capabilities are being limited.

This story draws on original reporting from Fortune.