guardrails

# Guardrails

Skill for implementing security guardrails and quality control.

## 4-Layer Security Architecture

```
┌─────────────────────────────────────────────────────┐
│                 LAYER 1: Input                       │
│ - Harmlessness screen (lightweight LLM)             │
│ - Pattern matching (jailbreak regex)                │
│ - PII detection/redaction                           │
└─────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────┐
│                 LAYER 2: System                      │
│ - Ethical guardrails in system prompt               │
│ - Explicit capability limits                        │
│ - Refusal instructions                              │
└─────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────┐
│                 LAYER 3: Output                      │
│ - Format validation                                 │
│ - Hallucination detection                           │
│ - Compliance check                                  │
└─────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────┐
│                 LAYER 4: Monitoring                  │
│ - Logs of all interactions                          │
│ - Alerts on suspicious patterns                     │
│ - Rate limiting per user                            │
└─────────────────────────────────────────────────────┘
```

## References

- [Input Guardrails](./references/input-guardrails.md) - Topical checks, jailbreak detection, PII redaction
- [Output Guardrails](./references/output-guardrails.md) - Format validation, hallucination detection, tool call validation

## Ethical Guardrails Template

```markdown
<<ethical_guardrails>>

You are bound by strict ethical and legal limits.

R
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details