ML inference latency optimization, model compression, distillation, caching strategies, and edge deployment patterns. Use when optimizing inference performance, reducing model size, or deploying ML at the edge.
View on GitHubmelodic-software/claude-code-plugins
systems-design
plugins/systems-design/skills/ml-inference-optimization/SKILL.md
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/melodic-software/claude-code-plugins/blob/main/plugins/systems-design/skills/ml-inference-optimization/SKILL.md -a claude-code --skill ml-inference-optimizationInstallation paths:
.claude/skills/ml-inference-optimization/# ML Inference Optimization ## When to Use This Skill Use this skill when: - Optimizing ML inference latency - Reducing model size for deployment - Implementing model compression techniques - Designing inference caching strategies - Deploying models at the edge - Balancing accuracy vs. latency trade-offs **Keywords:** inference optimization, latency, model compression, distillation, pruning, quantization, caching, edge ML, TensorRT, ONNX, model serving, batching, hardware acceleration ## Inference Optimization Overview ```text ┌─────────────────────────────────────────────────────────────────────┐ │ Inference Optimization Stack │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ Model Level │ │ │ │ Distillation │ Pruning │ Quantization │ Architecture Search │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ Compiler Level │ │ │ │ Graph optimization │ Operator fusion │ Memory planning │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ Runtime Level │ │ │ │ Batching │ Caching │ Async execution │ Multi-threading │ │ │ └──────────────────────────────────────────────────────────────┘ │ │