Provider-native prompt caching for Claude and OpenAI. Use when optimizing LLM costs with cache breakpoints, caching system prompts, or reducing token costs for repeated prefixes.
View on GitHubFebruary 4, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/plugins/ork-llm/skills/prompt-caching/SKILL.md -a claude-code --skill prompt-cachingInstallation paths:
.claude/skills/prompt-caching/# Prompt Caching
Cache LLM prompt prefixes for 90% token savings.
## Supported Models (2026)
| Provider | Models |
|----------|--------|
| Claude | Opus 4.1, Opus 4, Sonnet 4.5, Sonnet 4, Sonnet 3.7, Haiku 4.5, Haiku 3.5, Haiku 3 |
| OpenAI | gpt-5.2, gpt-5.2-mini, o3, o3-mini (automatic caching) |
## Claude Prompt Caching
```python
def build_cached_messages(
system_prompt: str,
few_shot_examples: str | None,
user_content: str,
use_extended_cache: bool = False
) -> list[dict]:
"""Build messages with cache breakpoints.
Cache structure (processing order: tools → system → messages):
1. System prompt (cached)
2. Few-shot examples (cached)
─────── CACHE BREAKPOINT ───────
3. User content (NOT cached)
"""
# TTL: "5m" (default, 1.25x write cost) or "1h" (extended, 2x write cost)
ttl = "1h" if use_extended_cache else "5m"
content_parts = []
# Breakpoint 1: System prompt
content_parts.append({
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral", "ttl": ttl}
})
# Breakpoint 2: Few-shot examples (up to 4 breakpoints allowed)
if few_shot_examples:
content_parts.append({
"type": "text",
"text": few_shot_examples,
"cache_control": {"type": "ephemeral", "ttl": ttl}
})
# Dynamic content (NOT cached)
content_parts.append({
"type": "text",
"text": user_content
})
return [{"role": "user", "content": content_parts}]
```
## Cache Pricing (2026)
```
┌─────────────────────────────────────────────────────────────┐
│ Cache Cost Multipliers (relative to base input price) │
├─────────────────────────────────────────────────────────────┤
│ 5-minute cache write: 1.25x base input price │
│ 1-hour cache write: 2.00x base input price │
│ Cache read: 0.10x base input price (90% off!) │
└───────────────────────────────────────────