BlogInsights

Ferentin Team

March 26, 2026

What the LiteLLM Supply Chain Attack Revealed About AI Credential Security

8 min read

In our previous post, Zero Trust for LLMs Explained, we described why every LLM request must be authenticated, authorized and continuously validated. We outlined the principles. On March 24, 2026, those principles were tested in production. And for millions of organizations, they were absent.

The Incident

On March 24, security researchers discovered that two versions of the LiteLLM Python package on PyPI, 1.82.7 and 1.82.8, contained a multi-stage backdoor. The threat actor group TeamPCP had compromised the package maintainer's publishing credentials through a prior supply chain attack on Trivy, an open-source security scanner used in LiteLLM's CI/CD pipeline. That earlier attack was assigned CVE-2026-33634 with a CVSS score of 9.4.

The malicious code used Python's .pth mechanism to achieve persistence, executing whenever Python was invoked, surviving package reinstalls and virtual environment recreation. The payload exfiltrated credentials to attacker-controlled infrastructure.

LiteLLM is downloaded roughly 3.4 million times per day. Research from Sonatype found it present in 36% of cloud environments. The malicious versions were available for approximately three hours before PyPI quarantined the package.

Three hours was enough. But the severity of the incident was not determined by the sophistication of the malware or the speed of the response. It was determined by a simpler question: what credentials did LiteLLM have access to?

More Than LLM Keys

LiteLLM began as an LLM proxy, a translation layer that normalizes API calls across providers. But modern AI gateways have grown well beyond that. LiteLLM, Portkey and similar tools now function as MCP gateways, credential vaults and integration hubs. They hold the keys to everything the AI stack touches.

A typical deployment stores several categories of credentials, all in the same process, all accessible through the same trust boundary:

LLM provider API keys. OpenAI, Anthropic, Google, Azure and AWS Bedrock. Each key is broadly scoped, providing full access to every model, every endpoint and every token of capacity. A single leaked key can generate six-figure bills in hours through automated requests. There is no IP restriction, no operation-level scoping and often no spend alert until the damage is done.

MCP server credentials. As organizations connect LLMs to tools through MCP, the gateway accumulates OAuth tokens and API keys for every connected service. Salesforce, GitHub, Jira, Slack, databases and internal APIs. A leaked Salesforce token is a data breach. A leaked GitHub token with write access is a supply chain attack vector. A leaked Slack bot token exposes internal communications.

Cloud infrastructure credentials. Gateways running on AWS, GCP or Azure inherit or store cloud credentials for accessing services like S3, KMS or IAM. Portkey's SSRF vulnerability (CVE-2025-66405) demonstrated this risk directly. The gateway's x-portkey-custom-host header allowed unauthenticated attackers to reach AWS's instance metadata service, exposing IAM credentials without ever touching the gateway's own configuration.

CI/CD secrets. The LiteLLM attack itself was circular. TeamPCP used stolen CI/CD secrets from the Trivy compromise to obtain LiteLLM's PyPI publishing credentials. Any organization running LiteLLM in a CI/CD environment exposed its own pipeline secrets to the same payload, feeding the supply chain with new targets.

This is the credential surface area of a modern AI gateway. It is not one key. It is every key the AI stack needs to function.

The Architecture That Made It Possible

LiteLLM stores provider credentials as environment variables and configuration values. They sit in os.environ, in plaintext, in process memory, for the entire lifetime of the service. Portkey's open-source gateway follows a similar pattern.

This is not a bug. It is a design choice, one that follows logically from the proxy architecture. If your gateway is designed as a passthrough (accept a request, attach the API key, forward to the provider) there is no reason to encrypt the key. The proxy needs it in plaintext on every request. The entire trust model assumes the process is not compromised.

The credential exposure exists at every layer:

At rest: Keys in environment variables or configuration files, readable by any process on the host
In transit: Keys attached to every outbound request, visible to middleware and interceptors
In memory: Keys in plaintext for the process lifetime, accessible to any code in the same context, including a supply chain payload
In logs: Keys appearing in debug output, error traces and crash dumps unless explicitly filtered

When TeamPCP's .pth payload executed inside the LiteLLM process, reading these credentials required no privilege escalation, no memory forensics, no exploit chain. The keys were right there.

What Credential Security Looks Like

If an AI gateway holds credentials for LLM providers, MCP-connected tools, cloud infrastructure and CI/CD pipelines, those credentials deserve the same protection as the most sensitive secrets in the enterprise. Not environment variables. Not configuration files. Envelope encryption with hardware-backed key management.

Layered Key Hierarchy

Rather than storing a credential directly, envelope encryption wraps each secret in a hierarchy of keys:

Master Key is stored in a hardware security module or cloud KMS. Never exposed to application code. Never present in process memory.

Key Encryption Key (KEK) is derived per tenant from the master key using a memory-hard key derivation function. Each tenant's KEK is cryptographically independent.

Data Encryption Key (DEK) is randomly generated per tenant, encrypted by the KEK and stored in the database as ciphertext.

Credential is encrypted by the DEK using AES-256-GCM with a unique initialization vector per operation.

If an attacker achieves code execution, through a supply chain attack, an SSRF or a dependency vulnerability, they find ciphertext. To reach the plaintext, they need the DEK. To reach the DEK, they need the KEK. To reach the KEK, they need the master key, which lives in KMS, outside the process boundary. Each layer requires breaking the one above it.

Memory-Hard Key Derivation

The KEK derivation is where computational cost becomes a defense. Argon2id, the current OWASP recommendation, is designed to make brute force economically impractical. Each derivation attempt requires a configurable amount of RAM, typically 19 MiB. This is the property that distinguishes it from older functions like PBKDF2 or bcrypt.

A GPU with thousands of cores cannot parallelize Argon2id the way it can parallelize SHA-256. A thousand concurrent derivation attempts require 19 GiB of memory. A million require 19 TiB. The cost of brute force scales with memory, not compute. And memory is the resource attackers cannot easily scale.

Combined with per-tenant salts, each tenant's KEK derivation is independent. Compromising one tenant's key reveals nothing about another's.

Fresh Entropy Per Operation

AES-256-GCM requires a unique initialization vector for every encryption operation under the same key. Reusing an IV enables key recovery from two ciphertexts, a well-documented, practically exploitable weakness.

Generating a fresh 12-byte random IV from the operating system's cryptographic entropy pool for every operation provides 96 bits of uniqueness. The birthday bound gives approximately 2^48 encryptions before collision risk. The discipline of never reusing an IV is what makes the cryptographic guarantee hold across millions of operations.

Key Rotation Without Downtime

Key rotation in most AI gateway deployments means changing an environment variable and restarting the service. This is not rotation. It is a deployment.

With envelope encryption and key versioning, rotation is invisible:

Introduce a new master key alongside the existing one
Activate it for new encryption operations while maintaining decryption with the old
Re-encrypt existing DEKs in the background
Decommission the old key once migration completes

No restart. No credential re-issuance. No coordination with LLM providers or MCP server operators. Each tenant's key version is tracked independently, providing per-tenant audit trails for compliance.

Credentials at the Edge

There is a harder version of this problem that most AI gateways do not address: decrypting credentials at the network edge without calling back to a central service.

Organizations with data sovereignty requirements cannot route LLM prompts or MCP tool calls through a shared cloud gateway. The data must stay within the organization's network boundary. This means the AI gateway must run locally. But if it calls a central KMS to decrypt credentials on every request, it reintroduces a network dependency that defeats the purpose.

The answer is deterministic key derivation from the edge node's identity. Each node derives its decryption key locally from a cryptographic seed distributed during enrollment. Policy bundles containing encrypted credentials are delivered to the edge. Decryption happens locally. The plaintext never leaves the customer's network. The control plane never sees it in transit.

If the control plane is compromised, as in the LiteLLM scenario, edge nodes continue operating with locally cached, locally decrypted credentials. The blast radius of a central compromise does not extend to the edge.

The Supply Chain Feeds Itself

The LiteLLM incident is worth studying not as an isolated event but as a pattern. TeamPCP compromised Trivy to steal CI/CD credentials. They used those credentials to compromise LiteLLM's PyPI packages. The compromised packages exfiltrated more credentials, including LLM keys, cloud tokens and MCP server secrets, from every environment that installed them.

Each credential stolen enables the next attack. A GitHub token becomes a new supply chain vector. An AWS credential becomes access to another organization's infrastructure. An MCP server OAuth token becomes a path into Salesforce, Jira or Slack.

The question is not whether your AI gateway will face a similar attack. It is whether your credential architecture limits the blast radius when it does.

Plaintext storage means every credential is exposed simultaneously. Envelope encryption, per-tenant isolation and edge-local decryption mean the attacker gets ciphertext. And breaking it requires resources that make the attack economically irrational.

Encryption Claims Require Proof

Any vendor can describe an encryption architecture in a blog post. The harder question is whether an independent auditor has verified that the controls work. Not as a point-in-time snapshot, but consistently over an extended period.

Ferentin recently completed SOC 2 Type II certification, with the audit covering our complete AI infrastructure: the LLM Gateway, the MCP Gateway and our AI Client connectors. The Trust Services Criteria tested included encryption at rest and in transit, tenant isolation, identity-based access controls and network segmentation, the same controls described in this post.

We also completed an independent Vulnerability Assessment and Penetration Testing (VAPT) engagement covering API endpoints, authentication flows, tenant isolation and infrastructure, with no critical or high-severity findings.

Both reports are available under NDA through our Trust Center. No sales call required.

Stay in the loop

Get the latest on enterprise AI security delivered to your inbox.