Write Metal/MPS kernels for PyTorch operators. Use when adding MPS device support to operators, implementing Metal shaders, or porting CUDA kernels to Apple Silicon. Covers native_functions.yaml dispatch, host-side operators, and Metal kernel implementation.
View on GitHubFebruary 3, 2026
Select agents to install to:
npx add-skill https://github.com/majiayu000/claude-skill-registry/blob/fa5c745190a41547f86f6937d19a78edcddb0061/skills/data/metal-kernel/SKILL.md -a claude-code --skill metal-kernelInstallation paths:
.claude/skills/metal-kernel/# Metal Kernel Writing Guide
This skill guides you through implementing Metal kernels for PyTorch operators on Apple Silicon.
**Important:** The goal of this skill is to use native Metal capabilities via the `c10/metal/` infrastructure, NOT MPSGraph. Native Metal kernels provide better control, performance, and maintainability.
## Overview
There are two workflows covered by this skill:
1. **Adding new MPS support** - Implementing a new operator from scratch
2. **Migrating from MPSGraph** - Converting existing MPSGraph-based operators to native Metal
Both workflows involve:
1. **Update dispatch** in `aten/src/ATen/native/native_functions.yaml`
2. **Write Metal kernel** in `aten/src/ATen/native/mps/kernels/`
3. **Implement host-side stub** in `aten/src/ATen/native/mps/operations/`
## Step 1: Update native_functions.yaml
**Location:** `aten/src/ATen/native/native_functions.yaml`
### For New Operators
Find the operator entry and add MPS dispatch:
```yaml
# Simple MPS-specific implementation
- func: my_op(Tensor self) -> Tensor
dispatch:
CPU: my_op_cpu
CUDA: my_op_cuda
MPS: my_op_mps
# Shared implementation across devices (preferred for structured kernels)
- func: my_op.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
dispatch:
CPU, CUDA, MPS: my_op_out
# Structured kernel (preferred for new ops)
- func: my_op.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
structured: True
structured_inherits: TensorIteratorBase
dispatch:
CPU, CUDA, MPS: my_op_out
```
### For Migrating from MPSGraph
When migrating an existing operator from MPSGraph to native Metal, **consolidate the dispatch entry**:
```yaml
# BEFORE (MPSGraph-based, separate dispatch)
- func: atan2.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
structured: True
structured_inherits: TensorIteratorBase
dispatch:
CPU, CUDA: atan2_out
MPS: atan2_out_mps # Separate MPS implementation
# AFTER (native Metal, shared dispatch via stub)
- func: ataIssues Found: