Add a tone-guardrail and policy-enforcement layer to a support agent

Produces a guardrail layer that intercepts a support agent's draft reply, rewrites it on-brand, and blocks policy violations (refunds, promises, unsafe content) before it ever reaches the customer.

Open in Studio

Prompt

You are a senior AI safety engineer who builds guardrail layers that sit between a model's draft and the customer, enforcing tone and policy on every message.

Design a guardrail layer that takes a draft support reply, checks it, and either passes, rewrites, or blocks it — before it is sent.

Context:
- Brand voice rules: [THE CONCRETE RULES — e.g. 'NO EXCLAMATIONS, NO APOLOGY-LOOPS, NO BLAMING THE CUSTOMER']
- Hard policy blocks (must NEVER reach customer): [LIST — e.g. 'UNAUTHORIZED REFUND PROMISES, SLA COMMITMENTS, LEGAL/MEDICAL ADVICE, EXPLETIVES, DISCLOSING INTERNAL SYSTEMS']
- Soft rewrites (allowed but must be rephrased): [e.g. 'OVER-APOLOGIZING, ROBOTIC AS-AI PHRASING, JARGON THE CUSTOMER WILL NOT KNOW']
- Sensitive triggers requiring human review: [THREATS TO SAFETY, CHURN/LEGAL LANGUAGE, VULNERABLE CUSTOMERS]
- Output of upstream agent: [A DRAFT REPLY IN NATURAL LANGUAGE]

Produce:
1. The guardrail's role and operating model — it is a gatekeeper, in second person ('You are…'). It receives a draft and the customer's original message; it returns a decision, never chats.
2. Decision types — PASS (send as-is), REWRITE (return the corrected version), BLOCK (do not send, route to human with a reason), ESCALATE (passes tone but flags the ticket for human review). Define each precisely.
3. The check sequence — run in order: (a) safety & policy hard-block scan, (b) sensitive-trigger escalation scan, (c) tone & brand-voice check, (d) factual-discipline check (no invented policy or numbers), (e) final pass. Explain why the order matters — safety before tone.
4. Rewrite rules — when it rewrites, it preserves the agent's intent and information but fixes only the violation. It returns the rewritten text and a one-line note on what changed.
5. Output schema — the fixed structure it returns: decision, final_text (if pass/rewrite), reason, route_to (if block/escalate), confidence. No prose beyond the schema.
6. Auditability — every decision logs the trigger, the rule, and the action, so a human can review blocks and rewrites later.

Rules:
- The guardrail never contacts the customer and never auto-resolves a sensitive trigger — those route to a human.
- It does not invent policy. If the draft makes a claim it cannot verify against the rules, BLOCK with 'unverified claim'.
- Tone fixes must preserve meaning; it must not soften a correct no into a misleading maybe.
- Safety and policy checks always win over tone. Never pass a hard-block for the sake of politeness.

Output: the guardrail system prompt, decision definitions, the ordered check sequence, rewrite rules, output schema, and audit-logging spec.

Success signal: the output is good only if the guardrail returns a fixed decision schema, runs safety/policy checks before tone, blocks every unverified policy claim, and routes all sensitive triggers to a human rather than auto-resolving.

Use case

Use when you already have a support agent generating replies and need a safety net that enforces tone and policy on every message.

When to use this

Between reply generation and customer delivery. It is a filter, not the agent itself.

Follow-up prompts

Build the policy-rules document this layer enforces, versioned and reviewable.
Write the test suite of 30 risky replies to validate the guardrail catches them.
Design the human-review queue for messages the guardrail blocks or rewrites.

#customer-support#guardrails#ai-agents#policy#automation

Source: promptfork seed
License: CC-BY-4.0
Published: 6/22/2026

Report

Add a tone-guardrail and policy-enforcement layer to a support agent

Use case

When to use this

Follow-up prompts

Explore more

More prompts you might like

Write a brand-aligned support-agent system prompt with escalation rules

Design a support-ticket triage agent that classifies and prioritizes

Zapier multi-step Zap to route and enrich new leads

Notion task database with smart formulas, automations, and role-based views

Make.com scenario blueprint with error handling, rate limits, and operations budgeting

Bulletproof tool-calling JSON schema for AI agents