Back to Skills

slo-sli-error-budget

verified

Use when defining SLOs, selecting SLIs, or implementing error budget policies. Covers reliability targets, SLI selection, and error budget management.

View on GitHub

Marketplace

melodic-software

melodic-software/claude-code-plugins

Plugin

systems-design

Repository
Verified Org

melodic-software/claude-code-plugins
13stars

plugins/systems-design/skills/slo-sli-error-budget/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/melodic-software/claude-code-plugins/blob/main/plugins/systems-design/skills/slo-sli-error-budget/SKILL.md -a claude-code --skill slo-sli-error-budget

Installation paths:

Claude
.claude/skills/slo-sli-error-budget/
Powered by add-skill CLI

Instructions

# SLOs, SLIs, and Error Budgets

Patterns and practices for defining service level objectives, selecting meaningful indicators, and managing reliability through error budgets.

## When to Use This Skill

- Defining SLOs for services
- Selecting appropriate SLIs
- Implementing error budget policies
- Balancing reliability and velocity
- Setting up SLO-based alerting

## Core Concepts

### SLI (Service Level Indicator)

```text
SLI = Quantitative measure of service level

What to measure:
- Availability: % of successful requests
- Latency: % of requests faster than threshold
- Throughput: Requests per second
- Error rate: % of failed requests

Formula:
SLI = (good events / total events) × 100%

Example:
Availability SLI = (successful requests / total requests) × 100%
             = (99,500 / 100,000) × 100%
             = 99.5%
```

### SLO (Service Level Objective)

```text
SLO = Target value for an SLI

Format: SLI >= Target over Time Window

Examples:
- 99.9% of requests successful over 30 days
- 95% of requests complete in <200ms over 7 days
- 99.95% availability measured monthly

Components:
┌─────────────────────────────────────────────────────┐
│ SLO = SLI + Target + Time Window                    │
│                                                      │
│ "99.9% of HTTP requests return non-5xx             │
│  over a rolling 30-day window"                      │
└─────────────────────────────────────────────────────┘
```

### Error Budget

```text
Error Budget = Allowed unreliability

If SLO = 99.9% availability:
Error Budget = 100% - 99.9% = 0.1%

Over 30 days:
Total minutes = 30 × 24 × 60 = 43,200
Error budget = 43,200 × 0.001 = 43.2 minutes

Or in requests (assuming 1M requests/month):
Error budget = 1,000,000 × 0.001 = 1,000 failed requests

Budget consumption:
┌────────────────────────────────────────────────────┐
│ Error Budget Remaining: 65%                        │
│ ████████████████████░░░░░░░░░░                    │
│ Consumed: 35% (15 min of 43.2 m

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
9308 chars