Guardrails¶
Overview¶
Guardrails add safety constraints and validation to your agents, ensuring they operate within defined boundaries. They monitor agent behavior in real-time to protect sensitive data, enforce content policies, and maintain compliance with regulatory requirements.
The Idun Agent Platform's guardrails implementation uses Guardrails AI under the hood to provide production-ready safety mechanisms for your agents.
Work in Progress
Output guardrails are currently a work in progress. Only input guardrails are fully supported at this time.
Setting Up Guardrails¶
You can configure guardrails when creating or editing an agent in the Manager UI.
Step 1: Navigate to Guardrails Configuration¶
During agent creation:
- Navigate to the Guardrails step in the agent creation wizard
- Select the guardrail type you want to add
Step 2: Configure Guardrails¶
Currently supported guardrail types:
Ban List
Blocks specific keywords or phrases from agent inputs and outputs. Useful for filtering profanity, competitor names, or sensitive topics that shouldn't appear in agent conversations.
Setup:
- Select Ban List from the guardrail type dropdown
- Enter 3 words or phrases to block
- Click Add or Next
PII Detector
Detects and handles personally identifiable information (PII) in agent conversations. Automatically identifies sensitive data like emails, phone numbers, or addresses to maintain privacy and meet compliance requirements like GDPR or HIPAA.
Setup:
- Select PII Detector from the guardrail type dropdown
- Select PII types to detect from the checkboxes (e.g., email, phone, address)
- Click Add or Next

API Key Required
Guardrails functionality requires the GUARDRAILS_API_KEY environment variable to be configured on your system. This key authenticates your integration with Guardrails AI services. Contact your platform administrator if guardrails options are not available in the UI.
Step 3: Test Your Guardrails¶
After configuration, test your guardrails before production:
- Complete agent setup and start it in a test environment
- Send inputs that should trigger guardrails (banned words, PII)
- Verify legitimate content passes through without false positives
- Refine rules based on test results
Best Practices¶
Effective Guardrail Usage
- Layer multiple guardrails for comprehensive protection - combine Ban Lists with PII detection
- Test thoroughly before production with edge cases and real user scenarios
- Monitor regularly to track trigger rates and identify false positives
- Update as needed - treat guardrails as a living system that evolves with your use case
- Balance security and UX - avoid overly restrictive rules that frustrate legitimate users
Troubleshooting¶
Guardrails not working?
- Check API key: Verify
GUARDRAILS_API_KEYis set correctly - Review configuration: Ensure guardrail settings are saved and active
- Check logs: Look for guardrail-related errors in agent runtime logs
- Test patterns: Verify your test input actually matches the guardrail rules
False positives?
- Make ban list rules more specific
- Create exception lists for known safe patterns
- Adjust PII detector sensitivity
- Review user reports and refine rules regularly
Next Steps¶
- Add MCP servers to extend agent capabilities
- Deploy your agent to production
- Learn about CLI for advanced workflows