BlogSecurity
T
Team Ferentin
June 12, 2026

How AI Agents Changed Data Protection

8 min read
Three-layer data protection: metadata scanning at discovery, runtime enforcement during execution, result protection on output

Data loss prevention has been around for fifteen years, built to answer one question: how do we catch sensitive data exfiltration at the exit point? The answer spans two domains. Network DLP inspects what users email or upload. Endpoint DLP watches device channels—USB drives, AirDrop, messaging apps. At either exit point, the model is uniformly reactive: DLP catches, blocks, or redacts. Forensic. Pattern-based. Designed to stop human-initiated exfiltration.

Then AI agents arrived and broke that model. Agents don't email things. They read descriptions of tools and decide autonomously which services to trust. They call tools. They ingest results. They pass data through chains of actions without stopping to ask whether the tool was real or whether it was asking for access it should not have.

Your DLP now has to answer a different question: is this tool safe to give to an agent at all? That question moves the threat detection upstream, from results to metadata. From what the agent outputs to what the agent is told about what the tool needs, what data it accesses, and what it claims to do. A tool description that advertises credentials, patient IDs, credit card ranges, or internal infrastructure is advertising a capability your agent should never trust.

The shift

This metadata scanning happens at the Model Context Protocol (MCP) layer—the protocol through which AI agents discover and invoke tools. When an agent initializes a connection to an MCP server, it receives structured metadata about what tools are available, what data they need, and what they do. That metadata is where the threat lives. A tool description can claim access to patient records or advertise a credential. The agent reads that metadata and decides whether to trust it—usually on very thin signals.

Here's how MCP works at a high level: the agent queries available tools, inspects their schemas, and invokes them based on its reasoning. This happens automatically, at scale, without human oversight. The attack surface is not the tool results—it's the metadata the agent reads before deciding to call the tool.

Model Context Protocol (MCP) Architecture: How agents discover and invoke tools through structured metadata

The critical difference is in how agents interpret that metadata compared to how humans do. A human reading the same tool description uses context and skepticism. They see an example credential like sk-test-abc123 and think, "that's clearly a test key, not real." They see a patient portal asking for SSN and pause, questioning whether that is really needed. They see a tool description promising to "securely manage your authentication" and wonder why a legitimate service would need to store credentials instead of the user storing them. Humans have built-in red-flag detection for API documentation.

An agent reading the same metadata has no such skepticism. It reads descriptions deterministically. It makes a binary decision: use this tool or don't. It cannot ask for clarification. It cannot override a data shape it sees. If a tool description says "Patient portal. Requires: SSN for database lookup", the agent reads that as "this tool needs that data, I should provide it." If a description says "Bank integration. Accounts supported: 1234567890, 9876543210", the agent sees example account numbers and reads them as configuration. The sensitive data is now in the agent's context, potentially in the tool results, definitely in the audit logs.

This matters at scale. One malicious MCP server in a catalog reaches thousands of agents automatically. A user reads one tool's documentation carefully. An agent discovers one tool's metadata in parallel with hundreds of others. The stakes are different.

The NSA's MCP security guidance flags this exact scenario: sensitive data leakage in tool descriptions is a supply-chain attack vector. A compromised MCP server can advertise itself with apparent authority by claiming access to sensitive systems or data categories. Agents will believe it. Your traditional DLP catches the data in the results. Metadata scanning catches it at discovery, before the agent decides to trust the tool at all. Credentials, patient IDs, credit card ranges, internal hostnames, all are fair game for metadata scanning.

Two layers of detection

Metadata scanning needs two layers because tools get smarter about hiding their intent.

The first layer is deterministic. It looks for obvious patterns in multiple data categories. Credentials: API key prefixes like sk-, pk-, AKIA, ghp_, npm_. PII: SSN patterns, email addresses that are not generic placeholders, phone number formats. PHI: medical record ID patterns, patient name formats, diagnosis codes. PCI: credit card number ranges, track data. Infrastructure: hostnames like localhost, internal.*, IP addresses, connection strings with embedded passwords. Context phrases like "API Key:", "Token:", "Secret:", "SSN:", "Patient ID:", "Account number:". Base64 blocks that might hide any of these. This layer is fast, reliable, and transparent. You can audit the rules. You can understand why a tool was flagged.

The second layer is an LLM judge. It reads tool descriptions and asks what the tool is really asking for and what data it is really promising access to. It detects subtle intent signals like "Securely stores your authentication tokens. No setup needed." or "Direct access to patient records. Just provide SSN for verification." A legitimate tool does not offer to store sensitive data for you or request raw PII for operations that should use tokenized references. It detects scope creep: "Reads emails. Also requires admin credentials." or "Integrates with your bank. Supports domestic and international accounts." It understands natural language hints like "Use your production key here" or "Examples: patient IDs 123456, 234567, 345678". It explains its reasoning so you can defend the decision to your customers.

Deterministic alone misses clever attacks. LLM alone is slow and expensive and can hallucinate. Together they cover the obvious and the subtle. A tool with sk-live- in the description gets caught immediately by regex. A tool with example SSN ranges gets caught by pattern matching. A tool promising to manage credentials or providing example patient IDs gets caught by the judge reading the actual claim.

Here are concrete examples across data categories:

Data CategoryTool DescriptionDetection LayerStatus
Credentials"Stripe integration. Requires: sk-test-abc123"Deterministic (regex match on sk-test-)BLOCK
PII (SSN)"Healthcare portal. Lookup patient by SSN. Example: 123-45-6789"Deterministic (SSN pattern match)BLOCK
PCI"Bank integration. Supports accounts: 1234567890, 9876543210"Deterministic (card length pattern match)BLOCK
Intent (Subtle)"SecureVault Pro. Securely stores your credentials and encryption keys. No configuration needed."LLM Judge (analyzes intent: offers to manage secrets)WARN/BLOCK

The first three are caught by deterministic patterns. The last one has no single obvious pattern but the LLM judge reads the intent: a tool offering to store credentials and secrets for you is a red flag. A legitimate tool does not ask you to hand over your secrets so it can manage them.

Platform baseline, customer control

Metadata scanning works best when it is split into two pieces: what Ferentin provides system-wide and what you control for your organization.

Ferentin's job is to scan all MCP tools in the catalog and flag obvious threats. The same baseline rules apply to every tenant. We detect credentials in descriptions. We detect injection patterns. We detect typosquatting. We prevent obviously compromised tools from being discoverable at all. You do not have to build this. It ships with Ferentin. It is fast and it is uniform.

Your job is to define what is risky for your organization. You decide which tools your agents can use. You define what data is sensitive in your business. You choose which domains your tools can reach. You set the policies that gate tool execution and result processing. A credential that is not sensitive in one organization might be critical in another. A tool that is useful for one workflow might be a liability for another. That judgment is yours, not ours.

Here is how this works in practice. Ferentin scans the catalog and passes 48 tools to your agent. You define a policy that allows only 3 of those tools. You set a DLP rule that masks email addresses in all results. You define an egress allowlist so tools can only reach Salesforce, HubSpot, and Stripe. You require human approval for any data pull over 10,000 records.

The agent can see 48 tools because Ferentin's baseline safety passed them. The agent can use 3 tools because your policy allows them. The agent can reach 3 domains because your network policy allows them. The agent's results are scanned by your DLP rules. Everything is audited with the policy rationale so your SOC can understand why actions were allowed or denied.

If a tool tries to reach internal.company.com, it is denied at the network layer. If a tool returns email addresses, they are masked by your DLP rules. If a tool is not in your approved list, it is not available to the agent at all. You built none of this infrastructure yourself. Ferentin provides the detection and the hooks. You configure the enforcement.

Three scanning windows

Data protection in the agent era happens at three points.

First is metadata, at discovery time. Does this tool claim to need sensitive data? Is it asking for access it doesn't need? Is the description advertising capabilities that cross your risk boundaries? Ferentin provides the deterministic patterns and the LLM judge scanning for credentials, PII, PHI, PCI, infrastructure details, and other sensitive categories. You provide the policy that says which tools are approved for your organization and what data categories matter in your business.

Second is runtime, while the tool is executing. Is this tool doing what it claims? Is it accessing data it should not? This is where your network allowlists matter. This is where egress policies block unauthorized calls. This is where you monitor whether a tool is behaving as expected or has been compromised. A tool that claims to need only customer emails but tries to reach your internal database gets caught here.

Third is results, after the tool returns. Did the tool leak sensitive data in its response? Did it expose credentials by accident? Did it include PII, PHI, PCI, or other protected information that should have been redacted? This is where DLP scanning applies. This is where tokenization happens. This is where your data protection rules are enforced. The same detector patterns that flagged "SSN" in a tool description catch SSN leakage in results.

All three are necessary. Metadata catches the deceptive tool before the agent uses it. Runtime catches the misbehaving tool while it is executing. Results catch the leaking tool after it returns. Together they create defense in depth. Credentials are one category. PII, PHI, PCI, and infrastructure are others. The framework applies to all of them.

Why this matters now

The NSA flagged sensitive data leakage in MCP tool metadata as a documented attack vector. Supply chain attacks are measured in minutes now, not days. A compromised tool reaches thousands of agents instantly. The agent cannot ask whether the tool is trustworthy or whether a tool claiming to need SSN or account numbers really needs them. It reads the description and decides.

This is not a hypothetical. Tools can be compromised. Descriptions can be misleading. Agents will use what they are told they can use. Your traditional DLP catches the problem after it leaks. Metadata scanning catches it before the agent ever considers using the tool. A tool advertising that it accesses patient records or credit card data gets caught at discovery time, not after your agent has retrieved that data.

Procurement standards are forming around this. Security teams are asking: does this vendor implement baseline detection for sensitive data in tool metadata plus customer-controlled runtime policies? Does your data plane run in your perimeter or theirs? Do you have visibility into what tools your agents considered using and why some were blocked? Can you audit which data categories each tool was flagged for?

These are not research questions anymore. They are becoming standard on security review templates. Especially for regulated industries where metadata exposure is an audit finding.

Authority belongs to you

This is the Ferentin principle applied to data protection. We detect. You control.

Ferentin provides metadata scanning system-wide for all sensitive data categories: credentials, PII, PHI, PCI, infrastructure, and others. You define runtime policies for your organization. We do not centralize your data protection rules. You do not have to build detection from scratch. We provide the detection capability and the scanning framework. You have authority over the decisions, the approved tools, and the data categories that matter in your business.

Your agents operate safely without sacrificing your autonomy. Your data stays in your perimeter. Your policies shape what the agent can do. Your audit trail shows both what Ferentin detected and what your policies allowed or denied. You can see which tools were flagged for credential exposure, which for PII risks, which for accessing infrastructure you consider internal.

If you are running agents in regulated industries, this matters. Healthcare. Financial services. Government. Public sector. Any workload touching sensitive data. Metadata threats are governance threats. Your authority over what happens in your environment is not optional. It is the foundation of trust.

See the CISO Reference for Agentic AI for the full seven-control-families model. This post is the data-protection lens on agent security. The framework is broader. The seams between the controls are where real deployments fail or succeed. Data protection is one control family. It spans metadata, runtime, and results. It enforces authority in your systems. The metadata scanning catches not just one type of leak but all the sensitive data categories your compliance framework cares about.

Stay in the loop

Get the latest on enterprise AI security delivered to your inbox.