Back to Skills

observability

verified

Implement logging, metrics, tracing, and alerting for production applications. Covers structured logging (Pino, Winston), metrics (Prometheus, DataDog, CloudWatch), distributed tracing (OpenTelemetry), and alert design. Use this skill when adding logging to services, setting up monitoring, creating alerts, debugging production issues, or designing SLIs/SLOs. Triggers on "logging", "monitoring", "alerting", "observability", "metrics", "tracing", "debug production", "correlation id", "structured logging", "dashboards", "SLI", "SLO".

View on GitHub

Marketplace

pokayokay

srstomp/pokayokay

Plugin

pokayokay

productivity

Repository

srstomp/pokayokay
2stars

plugins/pokayokay/skills/observability/SKILL.md

Last Verified

January 23, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/srstomp/pokayokay/blob/main/plugins/pokayokay/skills/observability/SKILL.md -a claude-code --skill observability

Installation paths:

Claude
.claude/skills/observability/
Powered by add-skill CLI

Instructions

# Observability

Implement the three pillars of observability: logs, metrics, and traces.

## The Three Pillars

| Pillar | Purpose | Key Question |
|--------|---------|--------------|
| **Logs** | Discrete events with context | What happened? |
| **Metrics** | Aggregated measurements | How much/many? |
| **Traces** | Request flow across services | Where did time go? |

**Quick pick:**
- Need to debug specific request? → Logs + Traces
- Need to alert on thresholds? → Metrics
- Need to understand system health? → All three
- Starting from zero? → Logs first, then metrics, then traces

## Logging Fundamentals

### Log Level Selection

```
FATAL → System is unusable, immediate action required
ERROR → Operation failed, needs attention soon
WARN  → Unexpected but recoverable, investigate later
INFO  → Significant business events, state changes
DEBUG → Detailed diagnostic info (never in prod default)
TRACE → Most granular, function-level (dev only)
```

**Decision guide:**

| Situation | Level | Example |
|-----------|-------|---------|
| Payment succeeded | INFO | `{ level: 'info', event: 'payment_completed', amount: 50 }` |
| Payment retry needed | WARN | `{ level: 'warn', event: 'payment_retry', attempt: 2 }` |
| Payment failed | ERROR | `{ level: 'error', event: 'payment_failed', code: 'DECLINED' }` |
| Database connection lost | FATAL | `{ level: 'fatal', event: 'db_connection_lost' }` |

### Structured Logging Pattern

```typescript
// ✅ Good: Structured, searchable
logger.info({
  event: 'order_created',
  orderId: '123',
  userId: 'user_456',
  amount: 99.99,
  duration_ms: 45
});

// ❌ Bad: String concatenation
logger.info(`Order 123 created for user user_456 with amount 99.99`);
```

**Required fields for every log:**
- `timestamp` (ISO 8601)
- `level` (string)
- `message` or `event` (what happened)
- `correlation_id` (request tracing)

**Contextual fields (when applicable):**
- `user_id`, `tenant_id` (who)
- `duration_ms` (how long)
- `error.message`, `error.st

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
11851 chars