Home Platform Egress DLP

Data Loss Prevention for AI Agent Pipelines

AI agents process sensitive data at machine speed. DAT's egress DLP scanner intercepts PII at four critical points in the agent pipeline, ensuring social security numbers, credit cards, and API keys never leak into LLM context, tool outputs, or long-term memory.

Seven Categories of PII, Caught in Milliseconds

Purpose-built regex scanner with algorithmic validation where pattern matching alone is not enough.

Beyond Simple Pattern Matching

Most DLP tools rely on regular expressions that produce false positives on every 16-digit number. DAT's scanner combines high-precision regex patterns with algorithmic validation. Credit card numbers are verified with the Luhn checksum. API keys are matched against real provider formats. The result is a scanner that catches real PII without flagging your order confirmation numbers.

  • SSN — US Social Security numbers (XXX-XX-XXXX format)
  • Credit Card — Visa, Mastercard, Amex, Discover with Luhn checksum verification
  • Email Address — Standard email format detection
  • Phone Number — US and international formats
  • API Key — OpenAI, AWS, Stripe, and generic secret key patterns
  • IP Address — IPv4 addresses including internal ranges
  • Passport — US passport number format
PII Scanner in Action
==============================

Input (tool output):
  "Customer John Smith, SSN
   123-45-6789, paid with card
   4532015112830366. Contact
   at [email protected] or call
   (555) 867-5309. Server at
   10.0.1.42, key sk-proj-abc123"

After DLP scan (redact mode):
  "Customer John Smith, SSN
   [SSN_REDACTED], paid with card
   [CREDIT_CARD_REDACTED]. Contact
   at [EMAIL_REDACTED] or call
   [PHONE_REDACTED]. Server at
   [IP_ADDRESS_REDACTED], key
   [API_KEY_REDACTED]"

Credit Card Validation:
  4532015112830366
  Luhn checksum: VALID
  -> Redacted

  1234567890123456
  Luhn checksum: INVALID
  -> Not flagged (not a real CC)

Scan time: ~0.5ms per 20KB

Two Modes: Redact or Block

Choose whether to sanitize sensitive data or prevent it from flowing entirely.

Proportional Response to Data Risk

Not every PII exposure carries the same risk. A customer email address in a support ticket is different from a credit card number in an LLM prompt. DAT gives you two enforcement modes so you can match the response to the threat level.

  • Redact Mode — Replaces PII with descriptive tokens like [SSN_REDACTED]. The agent continues working with sanitized data. The LLM never sees the original value
  • Block Mode — Withholds the entire tool output when PII is detected. The agent receives an error and must try a different approach. Zero data exposure
  • Per-Org Allowlists — Known-safe patterns (your company's IP range, a shared inbox address) can be allowlisted so they pass through without triggering the scanner
  • Per-Category Control — Enable or disable individual PII categories per organization. Financial agents may need credit card detection; internal tools may not

Both modes generate SIEM events. Redactions produce severity-4 alerts; blocks produce severity-7. Your SOC team sees exactly what was caught and where.

DLP Policy Configuration
==============================

Redact Mode:
  Input:  "SSN is 123-45-6789"
  Output: "SSN is [SSN_REDACTED]"
  Agent:  Continues with clean data
  SIEM:   pii_redacted (severity 4)

Block Mode:
  Input:  "SSN is 123-45-6789"
  Output: ERROR - PII detected
  Agent:  Must retry without PII
  SIEM:   pii_blocked (severity 7)

Per-Org Policy:
  {
    "enabled": true,
    "mode": "redact",
    "categories": [
      "ssn",
      "credit_card",
      "api_key",
      "ip_address",
      "passport"
    ],
    "allowlist": [
      "10.0.0.0/8",
      "[email protected]"
    ]
  }

Allowlisted patterns pass through
without triggering the scanner.

Four Critical Scan Points

PII is intercepted at every stage of the agent pipeline, not just at the output.

Defense in Depth for Agent Data Flows

A single scan point is a single point of failure. DAT scans data at four separate stages in the agent execution pipeline. Even if PII enters the system through a user's task description, it is caught before reaching the LLM. Even if a tool returns sensitive data, it is caught before being stored in memory.

  • Inbound Goal — The user's task description is scanned before the ReAct loop starts. The LLM sees [SSN_REDACTED], never the real value. This also prevents LLM safety refusals triggered by raw PII
  • Tool Output — Every tool result is scanned before it becomes the LLM's observation. Prevents web scrape results, API responses, or email content from leaking PII into the context window
  • Conversation Memory — Task summaries stored in Redis for cross-task context are scanned before persistence. Follow-up tasks cannot resurface redacted data
  • RAG Memory — Long-term memories stored in pgvector are scanned before embedding. Semantic search cannot retrieve PII that was never stored
DLP Scan Points in Agent Pipeline
==============================

1. INBOUND (Task Goal)
   User: "Find info on SSN 123-45-6789"
   LLM sees: "Find info on [SSN_REDACTED]"
   Why: Prevents LLM safety refusal
        + never enters context window

2. TOOL OUTPUT (Observation)
   web_search returns page with CC#
   Agent sees: [CREDIT_CARD_REDACTED]
   Why: Scraped data often contains
        PII from public sources

3. CONVERSATION MEMORY (Redis)
   Task result stored for follow-ups
   Stored as: sanitized summary
   Why: "Send that to my wife" can't
        resurface redacted data

4. RAG MEMORY (pgvector)
   Long-term memory auto-indexed
   Embedded as: clean text
   Why: Semantic search can't find
        PII that was never stored

Each scan: ~0.5ms per 20KB
Total pipeline overhead: <2ms
7
PII Categories
Luhn
CC Checksum
4
Scan Points
<1ms
Scan Latency

Ship AI Agents Without Shipping Your Data

Enterprise-grade PII protection for every agent in your fleet. Configurable per-org, enforced automatically.