Debug AOTInductor (AOTI) errors and crashes. Use when encountering AOTI segfaults, device mismatch errors, constant loading failures, or runtime errors from aot_compile, aot_load, aoti_compile_and_package, or aoti_load_package.
View on GitHubFebruary 5, 2026
Select agents to install to:
npx add-skill https://github.com/pytorch/pytorch/blob/0e87273df68431bc9f20a50e87d5f95045c1f706/.claude/skills/aoti-debug/SKILL.md -a claude-code --skill aoti-debugInstallation paths:
.claude/skills/aoti-debug/# AOTI Debugging Guide
This skill helps diagnose and fix common AOTInductor issues.
## First Step: Always Check Device and Shape Matching
**For ANY AOTI error (segfault, exception, crash, wrong output), ALWAYS check these first:**
1. **Compile device == Load device**: The model must be loaded on the same device type it was compiled on
2. **Input devices match**: Runtime inputs must be on the same device as the compiled model
3. **Input shapes match**: Runtime input shapes must match the shapes used during compilation (or satisfy dynamic shape constraints)
```python
# During compilation - note the device and shapes
model = MyModel().eval() # What device? CPU or .cuda()?
inp = torch.randn(2, 10) # What device? What shape?
compiled_so = torch._inductor.aot_compile(model, (inp,))
# During loading - device type MUST match compilation
loaded = torch._export.aot_load(compiled_so, "???") # Must match model/input device above
# During inference - device and shapes MUST match
out = loaded(inp.to("???")) # Must match compile device, shape must match
```
**If any of these don't match, you will get errors ranging from segfaults to exceptions to wrong outputs.**
## Key Constraint: Device Type Matching
**AOTI requires compile and load to use the same device type.**
- If you compile on CUDA, you must load on CUDA (device index can differ)
- If you compile on CPU, you must load on CPU
- Cross-device loading (e.g., compile on GPU, load on CPU) is NOT supported
## Common Error Patterns
### 1. Device Mismatch Segfault
**Symptom**: Segfault, exception, or crash during `aot_load()` or model execution.
**Example error messages**:
- `The specified pointer resides on host memory and is not registered with any CUDA device`
- Crash during constant loading in AOTInductorModelBase
- `Expected out tensor to have device cuda:0, but got cpu instead`
**Cause**: Compile and load device types don't match (see "First Step" above).
**Solution**: Ensure compile and