Edge LLM Routing vs Cloud Gateway

Same trust layer, two deployment topologies. Run the gateway in the operator's cloud or in your own VPC. The choice is rarely about features. It is about who is on the data path, what crosses your perimeter and which regulator you are answering to. This guide is a decision framework, with the technical deltas that matter for regulated and multi-region deployments.

Two deployment topologies compared: Cloud Gateway (data crosses customer perimeter to operator cloud) vs Customer-Private Edge (data path stays inside the customer VPC; only telemetry leaves to the control plane).

Two topologies

Same gateway code, two deployment shapes. The line that matters is the customer perimeter.

At a glance

How the two topologies compare across the dimensions that drive the decision.

Dimension	Cloud Gateway	Customer-Private Edge
Data path	Customer → public internet → managed gateway → LLM provider. Prompts and responses transit the gateway operator's cloud.	Customer VPC → in-VPC edge → LLM provider. Prompts and responses never leave the customer perimeter.
Egress shape	Full request and response bodies plus headers cross the customer boundary by design.	Only signed telemetry (token counts, model, latency, policy decisions, audit hashes) leaves the perimeter. No prompt or response content unless explicitly enabled.
Provider key location	Stored and decrypted in the gateway operator's cloud at request time.	Encrypted at rest in the customer VPC. Decrypted by the edge process per request, never logged, never observed by the control plane.
Audit log location	Hosted log in the gateway operator's cloud. Customer pulls or streams it back.	Local audit buffer in the customer VPC. Signed receipts mirrored to the control plane. The full payload, if recorded, stays customer-side.
Failure mode (control plane unreachable)	Gateway is the data path. Control plane outage stops LLM traffic.	Edge keeps serving with the cached policy bundle. Telemetry buffers locally and ships when reachability returns.
Latency	Adds one network hop to the gateway region. Cross-region calls pay full RTT.	In-region or in-VPC. Latency overhead is sub-10ms when co-located with the calling agent.
Multi-region	Bound to the gateway operator's region map. Data crosses regions to reach the gateway.	One edge per region (EU, US, India, AU, etc.). Each region is self-contained for data path and local audit.
Compliance posture	Customer inherits the gateway operator's SOC 2, ISO and BAA scope. Data residency is whatever the operator offers.	Customer keeps the data residency story. The trust-layer SOC 2 covers the control plane. The customer's own controls cover the in-VPC plane.
Operational ownership	Operator runs and patches the gateway. Customer runs nothing.	Operator ships the edge image and the control plane. Customer runs the edge in their VPC. Updates are policy-driven, not code-driven.
Subprocessor footprint	The gateway operator is a data-processing subprocessor for every LLM call.	The control plane is a metadata-processing subprocessor only. The data path has no third-party processor.

Delta 1

Egress shape

A cloud gateway is on the data path. By construction, every prompt and every response crosses the customer perimeter to reach it. That is the price of admission. The operator is now a data-processing subprocessor for every LLM call your enterprise makes, and the prompt body is the asset that crosses the boundary.

The customer-private edge inverts the topology. The gateway runs inside the customer VPC. Prompts and responses flow between the agent, the edge and the LLM provider without ever touching the operator's infrastructure. What does cross the customer boundary is signed telemetry: token counts, model identifier, provider region, latency, policy decision outcomes and the hash of an audit receipt. The control plane processes metadata, not content.

This is the difference between a data-processing subprocessor and a metadata-processing subprocessor. The downstream impact on your DPIA, your subprocessor disclosures and your customer-facing data-flow diagrams is significant. For workloads where the prompt body itself is regulated (PHI in clinical workflows, source code under export control, financial records subject to bank secrecy), the egress-shape difference is the entire reason to choose edge.

Delta 2

Failure mode

When a cloud gateway is unreachable, your LLM traffic stops. The gateway and the policy plane are the same thing. There is no local fallback, because the gateway never had a copy of your policy bundle on your premises. The blast radius of a gateway-side incident is your entire AI surface.

The edge is structured as a stateless data plane that pulls from a control plane. The policy bundle is cached locally and signed by the control plane. Identity claims are validated against a cached JWKS. Audit receipts are written to a local buffer that drains to the control plane when reachable. When the control plane is unreachable, the edge keeps serving with the last-known-good bundle and queues telemetry. From the application's perspective, nothing changes.

This is the same pattern a CDN edge uses for origin-shield outages. The data plane is decoupled from the control plane. The trust boundary is the signature on the bundle, not network reachability to the control plane. The practical consequence is that an outage in the operator's control plane does not become an outage of your application.

Delta 3

Compliance posture

A cloud gateway transfers a meaningful fraction of your compliance work to the operator. The operator's SOC 2, ISO 27001 and BAA cover the data path. That is convenient for general workloads. It is also a hard ceiling: you inherit whatever residency the operator offers, whatever regions they support and whatever subprocessors they use. If your regulator is stricter than your operator's baseline, you have a problem.

The customer-private edge keeps the compliance story on your side of the line for the data plane. The operator's SOC 2 covers the control plane (metadata processing). Your existing controls, attestations and residency commitments cover the in-VPC plane. EU GDPR strict-residency tenants, HIPAA-covered clinical workflows, FedRAMP boundaries, India DPDP cross-border restrictions, China PIPL workloads and high-risk EU AI Act systems all become tractable, because the prompt body never leaves the regime in question.

Multi-region is the same pattern. One edge in the EU, one in the US, one in India, one in Australia. Each is self-contained for data path and local audit. The control plane sees telemetry from all of them. The customer-facing data-flow diagram has one fewer cross-border arrow per region.

Regimes that often force the edge

HIPAA covered workflows touching PHI
FedRAMP Moderate / High boundaries
EU GDPR strict residency (DE, FR public sector)
India DPDP cross-border data restrictions
China PIPL data-export scrutiny
EU AI Act high-risk system audit trails
Bank secrecy regimes (CH, SG, financial services)
Source-code export-control workloads

Regimes the cloud usually handles

SOC 2 Type II for general developer workloads
ISO 27001 / 27701
GDPR for non-special-category data with SCCs
CCPA / CPRA
Standard BAA for low-PHI exposure
Commercial customer-data flows with a DPA

Decision framework

Which topology should we run?

Two questions get most teams to the right answer. The first is regulatory: does any reasonable read of your data classification or residency obligation require the prompt body to stay inside a specific perimeter? The second is operational: do you have an SRE function that can run a stateless container alongside your other services?

Cloud Gateway

No regulated data class on the prompt path. SOC 2, ISO and a DPA cover your obligations. Pick this when speed-to-deploy matters and the operator's residency map is sufficient.

Hybrid

Most workloads on the cloud gateway. Edge only for tenants tagged with regulated data (PHI, classified, special-category personal data). Tenant-routing is policy-driven and the same identity, RBAC and audit semantics apply on both planes.

Customer-Private Edge

Regulated workloads, residency-bound regions or multi-region with sovereignty constraints. Data path stays in your VPC, telemetry-only egress, per-region. The SOC 2 and BAA boundaries stay where you want them.

FAQ

Signed telemetry: token counts, model identifier, latency, policy decision outcomes and audit-receipt hashes. The prompt body, the response body and tool-call payloads do not leave the perimeter unless you explicitly enable payload mirroring for a tenant or workload (e.g. for centralized eval). The default is metadata-only egress.

Keep the data path in your perimeter

Same trust layer. Run it in your VPC. Telemetry-only egress to the control plane. Per-region by design.

Get Started Free Book a Demo