Companies turn to AI checks as agents take on more work

Businesses using agentic AI are facing a growing verification problem as automated systems take on more tasks. At Fortune Brainstorm Tech in Aspen, Colorado, executives said companies need clearer ways to trace AI work, spot errors and prove fixes when systems fail.

The discussion centered on accountability as companies weigh AI’s risks, including false outputs and agents that behave unpredictably, against competitive pressure to adopt the technology. Speakers from May Mobility, Thomson Reuters, Trustguard AI and SentinelOne described verification as a core operating challenge rather than a side issue.

Tracing AI decisions

Edwin Olson, founder and CEO of autonomous driving technology company May Mobility, said companies must build AI systems that perform as accurately as possible while accepting that errors will occur. Olson said businesses also need systems that let teams inspect what happened after a mistake, understand its cause and show regulators that the problem has been addressed.

That issue is acute in autonomous driving, where decisions can carry safety and regulatory consequences. Olson said May Mobility uses systems inside autonomous vehicles that can model and compare multiple situations at the same time before selecting a course of action.

Caitlin Halferty, chief data officer at Thomson Reuters, said AI users should be able to validate model outputs. She said she applies that standard with her own teams and encourages clients to do the same.

Thomson Reuters sells AI-enabled tools for professionals in areas including law and tax compliance, Halferty said. She said the company has emphasized accountability early in its AI work and treats transparency as one of four pillars of what it calls “fiduciary grade” products, along with data privacy and security, subject-matter expertise and dependable content.

Using AI to check AI

Several speakers pointed to systems that review one another as a way to improve reliability. Elena Kvochko, founder and CEO of Trustguard AI, described the approach as “LLM as a judge,” in which one AI system produces work and another looks for errors or inaccuracies.

Kvochko compared the setup to a newsroom, with one role focused on writing and another on editing. She said the checking function should sit in a separate AI system because a model should not be responsible for grading its own output.

The need for structured review is expected to rise as AI systems generate more material than people can inspect manually. Gregor Stewart, chief AI officer at SentinelOne, said organizations can end up with so much AI-produced work to audit that meaningful accountability becomes difficult.

Stewart pointed to software development as an area that is already further along than many other industries. He said teams are looking for ways to have AI agents reproduce review practices built over decades in safety-critical fields, rather than asking people to examine thousands of lines of AI-written code by hand.

Stewart said methods created for safety-critical technologies are likely to become more common in routine business practice. The executives’ shared message was that AI adoption will require audit trails, independent checks and systems designed for review from the start.

This story draws on original reporting from Fortune.