Implement logging, metrics, tracing, and alerting for production applications. Covers structured logging (Pino, Winston), metrics (Prometheus, DataDog, CloudWatch), distributed tracing (OpenTelemetry), and alert design. Use this skill when adding logging to services, setting up monitoring, creating alerts, debugging production issues, or designing SLIs/SLOs. Triggers on "logging", "monitoring", "alerting", "observability", "metrics", "tracing", "debug production", "correlation id", "structured logging", "dashboards", "SLI", "SLO".
View on GitHubsrstomp/pokayokay
pokayokay
January 23, 2026
Select agents to install to:
npx add-skill https://github.com/srstomp/pokayokay/blob/main/plugins/pokayokay/skills/observability/SKILL.md -a claude-code --skill observabilityInstallation paths:
.claude/skills/observability/# Observability
Implement the three pillars of observability: logs, metrics, and traces.
## The Three Pillars
| Pillar | Purpose | Key Question |
|--------|---------|--------------|
| **Logs** | Discrete events with context | What happened? |
| **Metrics** | Aggregated measurements | How much/many? |
| **Traces** | Request flow across services | Where did time go? |
**Quick pick:**
- Need to debug specific request? → Logs + Traces
- Need to alert on thresholds? → Metrics
- Need to understand system health? → All three
- Starting from zero? → Logs first, then metrics, then traces
## Logging Fundamentals
### Log Level Selection
```
FATAL → System is unusable, immediate action required
ERROR → Operation failed, needs attention soon
WARN → Unexpected but recoverable, investigate later
INFO → Significant business events, state changes
DEBUG → Detailed diagnostic info (never in prod default)
TRACE → Most granular, function-level (dev only)
```
**Decision guide:**
| Situation | Level | Example |
|-----------|-------|---------|
| Payment succeeded | INFO | `{ level: 'info', event: 'payment_completed', amount: 50 }` |
| Payment retry needed | WARN | `{ level: 'warn', event: 'payment_retry', attempt: 2 }` |
| Payment failed | ERROR | `{ level: 'error', event: 'payment_failed', code: 'DECLINED' }` |
| Database connection lost | FATAL | `{ level: 'fatal', event: 'db_connection_lost' }` |
### Structured Logging Pattern
```typescript
// ✅ Good: Structured, searchable
logger.info({
event: 'order_created',
orderId: '123',
userId: 'user_456',
amount: 99.99,
duration_ms: 45
});
// ❌ Bad: String concatenation
logger.info(`Order 123 created for user user_456 with amount 99.99`);
```
**Required fields for every log:**
- `timestamp` (ISO 8601)
- `level` (string)
- `message` or `event` (what happened)
- `correlation_id` (request tracing)
**Contextual fields (when applicable):**
- `user_id`, `tenant_id` (who)
- `duration_ms` (how long)
- `error.message`, `error.st