AI agents need audit-ready proof as companies give them more power

Technology Innovation Institute CEO Najwa Aaraj warned that companies are giving AI agents responsibilities that require stronger proof of proper behavior than policies or vendor assurances can provide. In a commentary published by Fortune on June 23, Aaraj said the issue has become urgent as agents move from producing answers to taking actions inside business systems.

Aaraj wrote that earlier enterprise tests of generative AI focused on whether models could summarize documents, assist customers, or support analysts and clinicians. She said that test still matters, but agents now also retrieve sensitive data, use software tools and APIs, edit records, and operate in live workflows.

That shift changes the risk, according to Aaraj. She said a chatbot error is often caught and corrected, while an agent that transfers funds, changes a hospital record, or sends code into production can create damage that is harder to contain.

Aaraj argued that enterprises need to know and prove which model and code ran, where the agent executed, what information it touched, and whether it remained within approved boundaries. She described AI agents as “non-human insiders” in effect, because they may hold privileges across email, databases, code repositories, and financial systems even though they have no intent.

Limits of current oversight

Human review still has a role in sensitive cases, Aaraj said, but she argued that putting a person in front of every agent action would undermine the productivity gains companies seek. The better approach, in her view, is to let agents operate only within limits that are enforceable and provable.

Aaraj said many companies have built governance programs for AI agents, including oversight committees, incident reviews, agent registries, identity controls, policy enforcement, and activity logs. She wrote that those steps are needed but do not amount to independent verification.

As an example, Aaraj pointed to a finance agent allowed to update vendor records and route payments through an enterprise resource planning system. A policy may restrict the agent to approved records and tools and require human approval for some decisions, she said, but the policy itself does not prove that those limits were followed.

Logs can help, Aaraj wrote, but she said they may be incomplete, spread across systems, or difficult to validate by themselves. She argued that auditors, regulators, or partners should be able to confirm whether the correct model version ran in a protected environment, used only authorized data, and enforced required approvals before acting.

Technical proof

Aaraj said existing technologies can support that kind of proof. She cited confidential computing, hardware-based attestation, cryptographic records, and strong identity frameworks as tools that can help verify which agent operated, under what conditions, and with what permissions.

In her view, these systems should work alongside AI control planes rather than replace them. Aaraj said control planes can set policy and record activity, while attestation can let outside parties confirm that the controls held without relying only on a platform’s own claims.

Aaraj said demand for independent evidence will be greatest in fields where accountability affects adoption, including banking, health care, government, defense, critical infrastructure, and sovereign AI programs. She also called for open standards because companies use multiple clouds, models, and agent frameworks, and she said trust should not depend on a single vendor.

Aaraj added that long-lived AI systems will need cryptography that can change as threats, regulations, and standards change. She said quantum-era risk is another reason for organizations handling high-value data to design AI infrastructure with cryptographic agility.

This story draws on original reporting from Fortune.