Cloudflare Workers AI for serverless GPU inference. Use for LLMs, text/image generation, embeddings, or encountering AI_ERROR, rate limits, token exceeded errors.
View on GitHubsecondsky/claude-skills
cloudflare-workers-ai
January 24, 2026
Select agents to install to:
npx add-skill https://github.com/secondsky/claude-skills/blob/main/plugins/cloudflare-workers-ai/skills/cloudflare-workers-ai/SKILL.md -a claude-code --skill cloudflare-workers-aiInstallation paths:
.claude/skills/cloudflare-workers-ai/# Cloudflare Workers AI - Complete Reference
Production-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.
**Status**: Production Ready ✅
**Last Updated**: 2025-11-21
**Dependencies**: cloudflare-worker-base (for Worker setup)
**Latest Versions**: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0
---
## Table of Contents
1. [Quick Start (5 minutes)](#quick-start-5-minutes)
2. [Workers AI API Reference](#workers-ai-api-reference)
3. [Model Selection Guide](#model-selection-guide)
4. [Common Patterns](#common-patterns)
5. [AI Gateway Integration](#ai-gateway-integration)
6. [Rate Limits & Pricing](#rate-limits--pricing)
7. [Production Checklist](#production-checklist)
---
## Quick Start (5 minutes)
### 1. Add AI Binding
**wrangler.jsonc:**
```jsonc
{
"ai": {
"binding": "AI"
}
}
```
### 2. Run Your First Model
```typescript
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
prompt: 'What is Cloudflare?',
});
return Response.json(response);
},
};
```
### 3. Add Streaming (Recommended)
```typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true, // Always use streaming for text generation!
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' },
});
```
**Why streaming?**
- Prevents buffering large responses in memory
- Faster time-to-first-token
- Better user experience for long-form content
- Avoids Worker timeout issues
---
## Workers AI API Reference
### Core API: `env.AI.run()`
```typescript
const response = await env.AI.run(model, inputs, options?);
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | string | Model ID (e.g., `@cf/meta/llama-3.1-8b-instruct`) |
| `inputs` | object | Mod