PromptFork

Add a tone-guardrail and policy-enforcement layer to a support agent

Produces a guardrail layer that intercepts a support agent's draft reply, rewrites it on-brand, and blocks policy violations (refunds, promises, unsafe content) before it ever reaches the customer.

Open in Studio
Prompt
You are a senior AI safety engineer who builds guardrail layers that sit between a model's draft and the customer, enforcing tone and policy on every message.

Design a guardrail layer that takes a draft support reply, checks it, and either passes, rewrites, or blocks it — before it is sent.

Context:
- Brand voice rules: [THE CONCRETE RULES — e.g. 'NO EXCLAMATIONS, NO APOLOGY-LOOPS, NO BLAMING THE CUSTOMER']
- Hard policy blocks (must NEVER reach customer): [LIST — e.g. 'UNAUTHORIZED REFUND PROMISES, SLA COMMITMENTS, LEGAL/MEDICAL ADVICE, EXPLETIVES, DISCLOSING INTERNAL SYSTEMS']
- Soft rewrites (allowed but must be rephrased): [e.g. 'OVER-APOLOGIZING, ROBOTIC AS-AI PHRASING, JARGON THE CUSTOMER WILL NOT KNOW']
- Sensitive triggers requiring human review: [THREATS TO SAFETY, CHURN/LEGAL LANGUAGE, VULNERABLE CUSTOMERS]
- Output of upstream agent: [A DRAFT REPLY IN NATURAL LANGUAGE]

Produce:
1. The guardrail's role and operating model — it is a gatekeeper, in second person ('You are…'). It receives a draft and the customer's original message; it returns a decision, never chats.
2. Decision types — PASS (send as-is), REWRITE (return the corrected version), BLOCK (do not send, route to human with a reason), ESCALATE (passes tone but flags the ticket for human review). Define each precisely.
3. The check sequence — run in order: (a) safety & policy hard-block scan, (b) sensitive-trigger escalation scan, (c) tone & brand-voice check, (d) factual-discipline check (no invented policy or numbers), (e) final pass. Explain why the order matters — safety before tone.
4. Rewrite rules — when it rewrites, it preserves the agent's intent and information but fixes only the violation. It returns the rewritten text and a one-line note on what changed.
5. Output schema — the fixed structure it returns: decision, final_text (if pass/rewrite), reason, route_to (if block/escalate), confidence. No prose beyond the schema.
6. Auditability — every decision logs the trigger, the rule, and the action, so a human can review blocks and rewrites later.

Rules:
- The guardrail never contacts the customer and never auto-resolves a sensitive trigger — those route to a human.
- It does not invent policy. If the draft makes a claim it cannot verify against the rules, BLOCK with 'unverified claim'.
- Tone fixes must preserve meaning; it must not soften a correct no into a misleading maybe.
- Safety and policy checks always win over tone. Never pass a hard-block for the sake of politeness.

Output: the guardrail system prompt, decision definitions, the ordered check sequence, rewrite rules, output schema, and audit-logging spec.

Success signal: the output is good only if the guardrail returns a fixed decision schema, runs safety/policy checks before tone, blocks every unverified policy claim, and routes all sensitive triggers to a human rather than auto-resolving.

Use case

Use when you already have a support agent generating replies and need a safety net that enforces tone and policy on every message.

When to use this

Between reply generation and customer delivery. It is a filter, not the agent itself.

Follow-up prompts

  • Build the policy-rules document this layer enforces, versioned and reviewable.
  • Write the test suite of 30 risky replies to validate the guardrail catches them.
  • Design the human-review queue for messages the guardrail blocks or rewrites.
#customer-support#guardrails#ai-agents#policy#automation
Source
promptfork seed
License
CC-BY-4.0
Published
6/22/2026

More prompts you might like

Write a brand-aligned support-agent system prompt with escalation rules

Produces a complete system prompt that defines a support agent's persona, voice, knowledge boundaries, and exact escalation rules — brand-aligned and safe to ship, not a generic 'be helpful'.

#customer-support#system-prompt
New

Design a support-ticket triage agent that classifies and prioritizes

Produces a system prompt and routing logic that classifies incoming tickets by intent, urgency, and language — then routes to the right queue with a priority and confidence score, so nothing critical sits in a generic inbox.

#customer-support#triage
New

Zapier multi-step Zap to route and enrich new leads

A step-by-step Zap blueprint: trigger, filter, enrich, branch by score, and notify the right channel.

New

Notion task database with smart formulas, automations, and role-based views

Design a Notion task system with genuinely useful formulas (overdue countdown, auto-archive logic, conditional formatting), native automations, and filtered views for different team roles.

New

Make.com scenario blueprint with error handling, rate limits, and operations budgeting

Blueprint a Make.com scenario with exact module configs, exponential backoff retry logic, data transformation pitfalls to avoid, and specific operations-saving patterns like early-filter routing.

New
Editor’s pickAutomation & AgentsSeed

Bulletproof tool-calling JSON schema for AI agents

Design strict, self-validating tool schemas with confidence calibration, discriminated unions, and chain-ready contracts — so your agent calls tools reliably instead of hallucinating arguments.

New