AI browser test shows websites can push agents past safety limits

Security researchers say a malicious website can manipulate AI-powered browsers into setting aside safety restrictions, exposing a risk in tools designed to act on a user’s behalf. The finding matters because these browsers combine ordinary web access with agents that can read pages, use accounts and carry out tasks.

Roy Paz, a researcher at security company LayerX, described the technique Monday in a report on what the company calls BioShocking. According to Paz, the attack uses a game-like prompt to push the browser’s language model into accepting false rules, then asks it to perform actions that should be blocked.

How the test worked

LayerX said the demonstration site presented the AI browser with a puzzle in which wrong answers were treated as correct. One example cited by Paz was teaching the model that 2 + 2 equals 5, a setup meant to detach the agent from normal assumptions before giving it a more sensitive request.

After the agent accepted the altered rules, the site prompted it to prove its technical ability by submitting text from a code box at a specified URL, Paz wrote. The prompt also used the phrase “Would you kindly” and ended with “victory is defeat,” references Paz linked to BioShock and George Orwell’s 1984.

Paz said that once the agents learned that incorrect actions were rewarded, they no longer treated the final task as a safety problem. He wrote that all six agents in the test failed to recognize a credential-compromising step as violating their guardrails.

LayerX said the technique worked across several AI browsers and browser-based agents, naming ChatGPT Atlas, Comet, Fellou, Genspark, Sigma and the Claude Chrome plugin. The company presented the work as a proof of concept rather than a fully developed attack.

Why AI browsers raise different risks

Jailbreaks that bypass model restrictions have affected chatbots for years, according to the Ars Technica report on the research. AI browsers add another layer of risk because they run with access to a user’s web sessions and can blend reading web content with taking actions, such as using stored credentials or interacting with private services.

AI browser makers have promoted agents that can complete multi-step chores from a single prompt, including finding restaurants, making reservations and sending confirmations. The same design gives a browser assistant a broad view of web pages and user accounts, which can become dangerous if hostile page content controls the agent’s instructions.

Adam Conway, a computer scientist and lead technical editor at XDA, raised a similar concern last year. Conway wrote that traditional browsers keep sites separated through rules such as same-origin policies, while an AI agent with broad access may bridge those separations if prompt injection succeeds.

Conway said that could turn AI browsers into a path for theft of personal data, authentication credentials and other sensitive information. His warning matches the central concern in the LayerX work: safety filters may not be enough when an agent treats attacker-controlled web content as instructions.

Ars Technica noted limits in the BioShocking demonstration. The game and its prompts were visible to the user, reducing stealth, and it was unclear whether the test could transmit extracted information to an outside destination. Even so, the research adds to evidence that merging browsing and autonomous AI actions creates security problems that current guardrails have not resolved.

This story draws on original reporting from Ars Technica.