Back to Skills

pii-masking-patterns

verified

PII detection and masking for LLM observability. Use when logging prompts/responses, tracing with Langfuse, or protecting sensitive data in production LLM pipelines.

View on GitHub

Marketplace

orchestkit

yonatangross/orchestkit

Plugin

ork-ai-observability

ai

Repository

yonatangross/orchestkit
55stars

plugins/ork-ai-observability/skills/pii-masking-patterns/SKILL.md

Last Verified

February 4, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/plugins/ork-ai-observability/skills/pii-masking-patterns/SKILL.md -a claude-code --skill pii-masking-patterns

Installation paths:

Claude
.claude/skills/pii-masking-patterns/
Powered by add-skill CLI

Instructions

# PII Masking Patterns

Protect sensitive data in LLM observability pipelines with automated PII detection and redaction.

## Overview

- Masking PII before logging prompts and responses
- Integrating with Langfuse tracing via mask callbacks
- Using Microsoft Presidio for enterprise-grade detection
- Implementing LLM Guard for input/output sanitization
- Pre-logging redaction with structlog/loguru

## Quick Reference

### Langfuse Mask Callback (Recommended)

```python
import re
from langfuse import Langfuse

def mask_pii(data, **kwargs):
    """Mask PII before sending to Langfuse."""
    if isinstance(data, str):
        # Credit cards
        data = re.sub(r'\b(?:\d[ -]*?){13,19}\b', '[REDACTED_CC]', data)
        # Emails
        data = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[REDACTED_EMAIL]', data)
        # Phone numbers
        data = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[REDACTED_PHONE]', data)
        # SSN
        data = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[REDACTED_SSN]', data)
    return data

# Initialize with masking
langfuse = Langfuse(mask=mask_pii)
```

### Microsoft Presidio Pipeline

```python
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

def anonymize_text(text: str, language: str = "en") -> str:
    """Detect and anonymize PII using Presidio."""
    results = analyzer.analyze(text=text, language=language)
    anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
    return anonymized.text
```

### LLM Guard Sanitization

```python
from llm_guard.input_scanners import Anonymize
from llm_guard.output_scanners import Sensitive
from llm_guard.vault import Vault

vault = Vault()  # Stores original values for deanonymization

# Input sanitization
input_scanner = Anonymize(vault, preamble="", language="en")
sanitized_prompt, is_valid, risk_score = input_scanner.scan(prompt)

# Output sanitization
output_scanner = Sensitive(enti

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
4813 chars