For the complete documentation index, see llms.txt. This page is also available as Markdown.

AI Guardrails

Configure multi-stage safety checks for AI agents using risk policies, regex rules, judge models, saga processors, and reusable presets

AI guardrails let you inspect, modify, or block content inside the agent loop. You can apply them to user prompts, model responses, and tool results. This gives you a central governance layer without rebuilding AI Agent APIs or the broader GenAI Models setup.

If you need fully custom governance, you can still embed checks directly in your CallAIAgent flow. The built-in pipeline covers the most common safety controls with less effort and better reuse.

GenAI Guardrails Configuration

Risk policy

The risk policy decides how findings from multiple guardrails are combined.

Each finding carries one of four risk levels:

  • LOW for weak or informational signals

  • MEDIUM for meaningful but recoverable concerns

  • HIGH for strong policy violations

  • CRITICAL for severe violations that should usually stop execution

After all checks run for a stage, Rierino evaluates the overall result. The policy can combine findings in four ways:

  • Weighted score: assign a numeric weight to each risk level

  • Block threshold: stop processing when the total score exceeds a limit

  • Critical override: block immediately when any critical finding appears

  • Count-based rules: block when a level appears too many times

The weighted score follows a sum-product model:

score=(risk weight×number of findings)\text{score} = \sum (\text{risk weight} \times \text{number of findings})

This lets you tune strictness without rewriting each individual guardrail.

Guardrail pipeline

An agent loop runs once for each request. It can read memory, call tools, generate responses, and repeat until the task completes. Guardrails can run at three independent points in that loop.

Pipeline stages

  • Inputs: validate user prompts before the model or tools see them

  • Outputs: review model responses before they reach the user

  • Tool Responses: inspect data returned from RAG, APIs, or tool calls before it goes back into the model context

Each stage has its own configuration. You can reuse the same rules everywhere or apply stricter controls only where they matter.

Input, output, and tool-response guardrails use the same configuration model. Only the stage and risk policy differ.

Guardrail actions

Each guardrail returns one of four actions:

  • ALLOW: pass the content unchanged

  • MODIFY: mask or rewrite part of the content

  • REPROMPT: ask the model to produce a safer answer

  • BLOCK: stop the current loop for that stage

REPROMPT is most useful on output checks. It lets the model recover instead of failing the whole request immediately.

Guardrail types

Rierino supports deterministic, model-based, and flow-based guardrails. You can combine them in the same stage.

Regex matcher

Use a regex matcher when you need fast, deterministic detection. Typical cases include prompt injection phrases, script tags, SQL fragments, or identifiers that must never pass through unchanged.

The matcher checks content against explicit patterns, named presets, or preset groups.

Configuration schema

Regex masker

Use a regex masker when the content is still useful after redaction. This is common for PII, account numbers, or contact details.

The masker keeps the request moving while removing sensitive values.

Configuration schema

LLM as Judge

Use an LLM judge when the policy depends on meaning, intent, tone, or context. This works well for brand violations, unsafe advice, policy classification, or nuanced output review.

The judge runs as a separate GenAI model and returns a decision marker. Rierino converts that decision into a guardrail action.

Configuration schema

Saga processor

Use a saga processor when you need full platform logic. This is the most flexible option. It can call states, queries, systems, rules, or other sagas before returning a decision.

This is the right choice for domain-specific validation, content enrichment, or policy logic that depends on business data.

Configuration schema

The guardrail saga receives the content to inspect in the content field. It should return a payload plus decision metadata.

Saga response schema

Guardrail presets

Regex guardrails include a built-in preset registry. Presets save time and keep common rules consistent across models.

You can reference a preset directly or include a preset group. You can also override the default action, risk level, or masking behavior inside a specific guardrail.

Preset patterns

Preset
Purpose
Default risk
Default action
Default masking

email

Detect email addresses

MEDIUM

MODIFY

[EMAIL]

us-ssn

Detect US Social Security numbers

HIGH

MODIFY

* with original length preserved

us-phone

Detect US and Canada phone numbers

MEDIUM

MODIFY

[PHONE]

credit-card

Detect 13 to 19 digit card numbers

HIGH

MODIFY

[CARD]

iban

Detect IBAN account numbers

HIGH

MODIFY

[IBAN]

ipv4

Detect IPv4 addresses

LOW

MODIFY

[IP]

ipv6

Detect full-form IPv6 addresses

LOW

MODIFY

[IPV6]

dob-iso

Detect dates of birth in YYYY-MM-DD format

MEDIUM

MODIFY

[DOB]

dob-us

Detect dates of birth in MM/DD/YYYY format

MEDIUM

MODIFY

[DOB]

sql-injection

Detect common SQL injection phrasing

HIGH

BLOCK

Not used

javascript-injection

Detect script tags, javascript: URIs, and inline handlers

HIGH

BLOCK

Not used

forced-instruction

Detect common jailbreak overrides of system instructions

HIGH

BLOCK

Not used

prompt-leak

Detect attempts to reveal hidden prompts or instructions

MEDIUM

BLOCK

Not used

command-injection

Detect dangerous shell command chaining

CRITICAL

BLOCK

Not used

path-traversal

Detect repeated directory traversal sequences

MEDIUM

BLOCK

Not used

Preset groups

Group
Included presets
Typical use

pii-basic

email, us-ssn, us-phone, credit-card

Basic redaction for common personal data

pii-extended

pii-basic plus iban, ipv4, ipv6, dob-iso, dob-us

Broader privacy filtering

jailbreak-basic

sql-injection, javascript-injection, forced-instruction, prompt-leak

Baseline prompt defense

jailbreak-extended

jailbreak-basic plus command-injection, path-traversal

Stronger defense for tool-enabled agents

Last updated