LLM streaming response patterns. Use when implementing real-time token streaming, Server-Sent Events for AI responses, or streaming with tool calls.
View on GitHubyonatangross/skillforge-claude-plugin
orchestkit-complete
January 23, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/skillforge-claude-plugin/blob/main/./skills/llm-streaming/SKILL.md -a claude-code --skill llm-streamingInstallation paths:
.claude/skills/llm-streaming/# LLM Streaming
Deliver LLM responses in real-time for better UX.
## Basic Streaming (OpenAI)
```python
from openai import OpenAI
client = OpenAI()
async def stream_response(prompt: str):
"""Stream tokens as they're generated."""
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
```
## Streaming with Async
```python
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def async_stream(prompt: str):
"""Async streaming for better concurrency."""
stream = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
stream=True
)
async for chunk in stream:
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
```
## FastAPI SSE Endpoint
```python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from sse_starlette.sse import EventSourceResponse
app = FastAPI()
@app.get("/chat/stream")
async def stream_chat(prompt: str):
"""Server-Sent Events endpoint for streaming."""
async def generate():
async for token in async_stream(prompt):
yield {
"event": "token",
"data": token
}
yield {"event": "done", "data": ""}
return EventSourceResponse(generate())
```
## Frontend SSE Consumer
```typescript
async function streamChat(prompt: string, onToken: (t: string) => void) {
const response = await fetch("/chat/stream?prompt=" + encodeURIComponent(prompt));
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split('\n');
for (const