Trustworthy Agents Need Zero Trust Infrastructure

8 min read

Anthropic's Trustworthy Agents framework names the right problem. Once an agent starts taking actions on your behalf, trust cannot be a training artifact. It has to be enforced at runtime. Their five pillars (human control, value alignment, secure interactions, transparency and privacy) describe what trustworthy agents look like. Their four-layer architecture (model, harness, tools, environment) explains why model safety alone is not enough — a well-trained model can still be exploited through an overly permissive tool or an exposed environment.

Ferentin is the trust layer for the tools and environment layers. Zero trust policy enforcement from prompt to tool call.

The security stack wasn't built for agents

Enterprises have spent a decade building a network security stack for humans and apps. Secure Web Gateways, Security Service Edges, CASBs, next-gen firewalls, DLP. It's a lot of infrastructure. For traditional traffic it works.

For agents, it sees nothing that matters.

These tools aren't dumb. They carry the signals a modern identity-aware perimeter depends on. An SWG or SSE knows the authenticated user, whether the device is enrolled and managed, the device's posture, the geolocation, the time of day. A CASB classifies the destination, tags the SaaS tenant and feeds allow/deny decisions that work well for human traffic. What none of them can tell you is which agent, or which sub-agent in a multi-step plan, actually generated the call. To the stack, a request from ChatGPT reaching api.chatgpt.com or mcp.box.com looks identical to a request from a LangGraph worker hitting the same endpoint, as long as both came from the same user on the same managed device. The user is known. The device is known. The agent is anonymous.

Point solutions are closing parts of this gap, and each one is genuinely useful. GenAI gateways and LLM firewalls like Portkey, Kong AI Gateway, Cloudflare AI Gateway and a growing roster of prompt-security startups sit in front of LLM APIs and can read prompts and responses. What they identify is an API key or an OAuth client, not an agent. Workload identity planes like SPIFFE can cryptographically name a headless worker, but only for workloads you run yourself. They don't help for ChatGPT in a browser or Claude Desktop on the device. Enterprise browsers can tag a tab as "ChatGPT". EDR can name a process. Neither can distinguish the sub-agents inside a multi-step plan. Each of these solves a slice. None of them closes the loop.

The deeper reason is architectural. Agent identity is an application-layer assertion. The agent, or its host, has to present it at the call boundary, cryptographically bound to a session. No bump-in-the-wire box can derive it from packets. And even once you know which agent is calling, you still need to read the MCP session to see what it asked for. Which tool. Which arguments. Which document. Which field. A firewall sees an allowed egress. DLP inspects file uploads, not the JSON body of an MCP tools/call. The existing stack is perimeter-aware and protocol-agnostic. Agents are the opposite. They live inside the allowed perimeter, under a legitimate user's identity, and their intent is encoded in application-layer payloads the network stack was never built to parse.

You can't Zero Trust what you can't see. And you can't see agent behavior with a box that only understands five-tuples, hostnames and user identity.

What this looks like in practice

To make it concrete, picture four very different agents running side by side on a single device. Same device. Same user. Same network egress. Each has scoped access to exactly the tools it needs, and nothing else.

ChatGPT, a browser-based assistant, reaches Asana and Box.
Claude Desktop, a native desktop assistant, reaches Box.
Gemini CLI, a terminal-based assistant, reaches Linear.
A custom LangGraph agent, headless Python, reaches Okta.

Four runtimes, four trust profiles, one policy plane. ChatGPT can't touch Linear. The LangGraph agent can't see a Box folder. Claude Desktop can read documents but can't escalate into Okta. Every call, whether from a browser tab, a native app, a shell or a Python loop, is authenticated, authorized and audited the same way. Nothing exceeds its bounds, because the bounds aren't set by the client. They are set by the plane in front of it.

To the network stack, this looks like one device making HTTPS calls to a handful of sanctioned SaaS endpoints. Indistinguishable from any other Tuesday. To Ferentin, it's four distinct agent identities under four distinct policies, enforced per call.

When the demo ends, the proof is in the receipts. Every call each agent made is recorded as a signed, tamper-evident entry: which tool, which arguments, which policy verdict, on whose behalf. The log isn't just present. It's verifiable. An auditor can replay the session weeks later and cryptographically confirm that ChatGPT never touched Linear, that the LangGraph agent never reached Box, that nothing slipped through. That's the gap a new plane closes.

Zero Trust, applied to agents

Ferentin sits where the network stack goes dark: at the agent-to-LLM and agent-to-tool boundary, speaking the protocols the rest of the stack doesn't. Classic Zero Trust (NIST SP 800-207) rests on three moves. Never trust, always verify. Least privilege by default. Assume breach. Apply those to an agent reaching for a model or a tool, and you get our architecture.

Never trust the caller. Every request to an LLM or an MCP server is authenticated and authorized at the edge. We distinguish user mode (interactive, OAuth2 with consent) from agent mode (automation, tenant-bound) so policies can treat a human-in-the-loop differently from a headless workload, even on the same tool.

Least privilege, per call. Tool permissions, data-class restrictions, provider routing and approval gates are policy, not code. Policies are authored centrally and hot-reload at the edge in seconds. When a tool needs mid-flight input (a credential choice, a destructive-action confirmation, an OAuth scope) we use MCP elicitations so sensitive data never flows through the model's context. Human-in-the-loop by protocol, not by convention.

Assume breach. Defense in depth across the request path: content sanitization against prompt injection, mTLS between edges and the control plane, strict per-tool-view CSP for MCP Apps, WAF in block mode at the perimeter, tenant DEK encryption for payloads at rest. No single layer is the answer. The stack is.

Verify everything, forever. Every LLM call, every MCP invocation, every admin change and every login lands in an immutable, tenant-isolated audit trail, on both the cloud path and the customer-edge path. Records are cryptographically signed at the moment of capture, so anyone with the verification key can prove, after the fact, exactly what the agent did and that the log hasn't been altered. If you can't answer what did the agent do, on whose behalf, under which policy, and prove the answer, you don't have a trustworthy agent. You have a liability.

Keep data where it belongs. Ferentin gives customers a choice. On the private edge, service-edge runs inside the customer's own infrastructure, talks directly to LLM providers and ships only audit metadata back to the control plane. Prompts and responses never transit our cloud. On the public edge, traffic flows through Ferentin's cloud under tenant-isolated encryption, per-request policy enforcement and the same audit trail. Data sovereignty becomes a deployment decision, not a vendor negotiation.

Why this maps to Anthropic's pillars

Zero Trust isn't a competing framework. It's the operational shape of Anthropic's five. Human control becomes policy and elicitation. Value alignment becomes mode-aware authorization. Secure interactions become defense in depth. Transparency becomes continuous, verifiable audit. Privacy becomes a deployment choice.

Ferentin is MCP-native end-to-end, so enterprises inherit these controls from an open standard rather than a bespoke protocol. That's exactly the ecosystem outcome Anthropic calls for.

Trustworthy agents need trustworthy infrastructure. Zero Trust is what that looks like when you build it for real, at the layer where agents actually live.

Stay in the loop

Get the latest on enterprise AI security delivered to your inbox.

Split diagram showing local CLI process execution on one side and remote HTTP MCP server on the other, separated by a security boundary

InsightsApr 17, 2026

The security stack wasn't built for agents

What this looks like in practice

Zero Trust, applied to agents

Why this maps to Anthropic's pillars

Related articles

"The Mother of All AI Supply Chains" — Or Just the Same Old CLI Problem

AI Agents Are Your Newest Insider Risk

What the LiteLLM Supply Chain Attack Revealed About AI Credential Security