Back to Skills

prompt-caching

verified

Provider-native prompt caching for Claude and OpenAI. Use when optimizing LLM costs with cache breakpoints, caching system prompts, or reducing token costs for repeated prefixes.

View on GitHub

Marketplace

orchestkit

yonatangross/orchestkit

Plugin

ork

development

Repository

yonatangross/orchestkit
33stars

skills/prompt-caching/SKILL.md

Last Verified

January 25, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/skills/prompt-caching/SKILL.md -a claude-code --skill prompt-caching

Installation paths:

Claude
.claude/skills/prompt-caching/
Powered by add-skill CLI

Instructions

# Prompt Caching

Cache LLM prompt prefixes for 90% token savings.

## Supported Models (2026)

| Provider | Models |
|----------|--------|
| Claude | Opus 4.1, Opus 4, Sonnet 4.5, Sonnet 4, Sonnet 3.7, Haiku 4.5, Haiku 3.5, Haiku 3 |
| OpenAI | gpt-4o, gpt-4o-mini, o1, o1-mini (automatic caching) |

## Claude Prompt Caching

```python
def build_cached_messages(
    system_prompt: str,
    few_shot_examples: str | None,
    user_content: str,
    use_extended_cache: bool = False
) -> list[dict]:
    """Build messages with cache breakpoints.

    Cache structure (processing order: tools → system → messages):
    1. System prompt (cached)
    2. Few-shot examples (cached)
    ─────── CACHE BREAKPOINT ───────
    3. User content (NOT cached)
    """
    # TTL: "5m" (default, 1.25x write cost) or "1h" (extended, 2x write cost)
    ttl = "1h" if use_extended_cache else "5m"

    content_parts = []

    # Breakpoint 1: System prompt
    content_parts.append({
        "type": "text",
        "text": system_prompt,
        "cache_control": {"type": "ephemeral", "ttl": ttl}
    })

    # Breakpoint 2: Few-shot examples (up to 4 breakpoints allowed)
    if few_shot_examples:
        content_parts.append({
            "type": "text",
            "text": few_shot_examples,
            "cache_control": {"type": "ephemeral", "ttl": ttl}
        })

    # Dynamic content (NOT cached)
    content_parts.append({
        "type": "text",
        "text": user_content
    })

    return [{"role": "user", "content": content_parts}]
```

## Cache Pricing (2026)

```
┌─────────────────────────────────────────────────────────────┐
│  Cache Cost Multipliers (relative to base input price)      │
├─────────────────────────────────────────────────────────────┤
│  5-minute cache write:  1.25x base input price              │
│  1-hour cache write:    2.00x base input price              │
│  Cache read:            0.10x base input price (90% off!)   │
└─────────────────────────────────────────────

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
5567 chars