LLM inference infrastructure, serving frameworks (vLLM, TGI, TensorRT-LLM), quantization techniques, batching strategies, and streaming response patterns. Use when designing LLM serving infrastructure, optimizing inference latency, or scaling LLM deployments.
View on GitHubmelodic-software/claude-code-plugins
systems-design
plugins/systems-design/skills/llm-serving-patterns/SKILL.md
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/melodic-software/claude-code-plugins/blob/main/plugins/systems-design/skills/llm-serving-patterns/SKILL.md -a claude-code --skill llm-serving-patternsInstallation paths:
.claude/skills/llm-serving-patterns/# LLM Serving Patterns ## When to Use This Skill Use this skill when: - Designing LLM inference infrastructure - Choosing between serving frameworks (vLLM, TGI, TensorRT-LLM) - Implementing quantization for production deployment - Optimizing batching and throughput - Building streaming response systems - Scaling LLM deployments cost-effectively **Keywords:** LLM serving, inference, vLLM, TGI, TensorRT-LLM, quantization, INT8, INT4, FP16, batching, continuous batching, streaming, SSE, WebSocket, KV cache, PagedAttention, speculative decoding ## LLM Serving Architecture Overview ```text ┌─────────────────────────────────────────────────────────────────────┐ │ LLM Serving Stack │ ├─────────────────────────────────────────────────────────────────────┤ │ Clients (API, Chat UI, Agents) │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Load Balancer / API Gateway │ │ │ │ • Rate limiting • Authentication • Request routing │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Inference Server │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ │ │ Request │ │ Batching │ │ KV Cache │ │ │ │ │ │ Queue │──▶│ Engine │──▶│ Management │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │ │ │ │ │ │ │ │ │ ▼