Configuring Guardrails

Guardrails define the boundaries your agent should never cross, regardless of what it's been asked or how the conversation unfolds. They're different from instructions. Instructions tell the agent what to do. Guardrails tell it what never to do (and what to do always), and when the two conflict, guardrails take priority.

This matters because instructions are flexible by nature. You want the agent to interpret them in context and adapt. Guardrails are the opposite. They're rigid by design. If a guardrail says "never share internal pricing," the agent won't share it even if the instructions say "answer all customer questions fully." That's intentional, and it's what makes guardrails the right place for non-negotiable rules.

Built-in guardrails  

Every agent comes with two built-in guardrails that can be toggled on or off during creation or from the agent details page.

Fairness & Bias monitors for potential biases in the agent's responses, whether they stem from patterns in the training data, cultural assumptions, or skewed reasoning. When enabled, it flags and adjusts responses that could reflect unfair treatment of any group or perspective.

Toxicity Check prevents the agent from generating harmful, offensive, or toxic content. This includes profanity, hate speech, and any language that could make interactions hostile or uncomfortable.

Both are enabled by default when you create an agent. You can toggle them off if your use case requires it, but for most production agents, keeping them on is recommended.

Custom guardrails  

Custom guardrails let you define rules specific to your agent's role and your organization's requirements. Toggle on Custom Guardrails during agent creation (Step 4: Additional Settings) or from the agent details page, and you'll see two sections: Do's and Don'ts.

  • Do's are behaviors the agent should always follow. These are affirmative rules that reinforce how the agent should behave in specific situations.

  • Don'ts are behaviors the agent should never exhibit. These are hard boundaries that override everything else.

Click + Add Do's or + Add Dont's to add rules. Each rule is a numbered entry. Be specific and concrete rather than vague and broad.

Writing effective guardrails  

The difference between a useful guardrail and a useless one usually comes down to specificity.

Vague guardrails don't help. A rule like "Be professional" gives the agent almost nothing to work with. What counts as professional depends entirely on context, and the agent is already following its instructions for tone and behavior. A guardrail this broad adds no real boundary.

Specific guardrails work. A rule like "Never disclose pricing for enterprise plans" is clear, testable, and enforceable. The agent knows exactly what to avoid and when.

Here are some examples of well-written guardrails:

 

Do's:

  • Always include the ticket ID when referencing a support case

  • Cite the specific document name when providing information from the knowledge base

  • Ask for confirmation before performing any action that modifies a record

Don'ts:

  • Never share internal employee names or contact details with external users

  • Never provide legal, medical, or financial advice

  • Never compare Zoho products negatively against competitors

  • Never fabricate information if the knowledge base doesn't contain the answer

When to use a guardrail vs an instruction  

This is where it gets confusing sometimes. Both instructions and guardrails influence the agent behavior, but they serve different purposes.

  • Use instructions when the rule is about how the agent does its job on a day-to-day basis. Tone, format, workflow logic, what tools to use, how to structure a response. These are operational guidelines that the agent interprets in context.

  • Use guardrails when the rule is a hard boundary that should hold no matter what. Compliance requirements, data privacy constraints, things the agent must always do or must never do regardless of the conversation. If you'd be alarmed seeing the agent break this rule even once, it belongs in guardrails.

A practical test: if the rule has the word "never" or "always" in it and you mean it literally, it's a guardrail. If it's more like "generally" or "when possible," it's an instruction.

How conflicts are resolved  

If an instruction and a guardrail point in different directions, the guardrail wins. Every time, without exception.

For example, say your instructions tell the agent to "provide detailed answers to all customer questions about our products." But a guardrail says "never disclose the technical architecture of our backend systems." If a customer asks how the backend works, the agent follows the guardrail and declines to answer that specific question, while continuing to answer other product questions as instructed.

This priority system is what makes guardrails reliable as a safety mechanism. You don't have to worry about edge cases in your instructions accidentally overriding a critical boundary.

PREVIOUS

UP NEXT