POC validation patterns to catch issues before committing to long-running ML experiments. TRIGGERS - fail-fast, POC validation, preflight check, experiment validation, schema validation, gradient check, sanity check, smoke test.
View on GitHubFebruary 5, 2026
Select agents to install to:
npx add-skill https://github.com/terrylica/cc-skills/blob/main/plugins/devops-tools/skills/ml-failfast-validation/SKILL.md -a claude-code --skill ml-failfast-validationInstallation paths:
.claude/skills/ml-failfast-validation/# ML Fail-Fast Validation
POC validation patterns to catch issues before committing to long-running ML experiments.
## When to Use This Skill
Use this skill when:
- Starting a new ML experiment that will run for hours
- Validating model architecture before full training
- Checking gradient flow and data pipeline integrity
- Implementing POC validation checklists
- Debugging prediction collapse or gradient explosion issues
---
## 1. Why Fail-Fast?
| Without Fail-Fast | With Fail-Fast |
| ------------------------- | ---------------------- |
| Discover crash 4 hours in | Catch in 30 seconds |
| Debug from cryptic error | Clear error message |
| Lose GPU time | Validate before commit |
| Silent data issues | Explicit schema checks |
**Principle**: Validate everything that can go wrong BEFORE the expensive computation.
---
## 2. POC Validation Checklist
### Minimum Viable POC (5 Checks)
```python
def run_poc_validation():
"""Fast validation before full experiment."""
print("=" * 60)
print("FAIL-FAST POC VALIDATION")
print("=" * 60)
# [1/5] Model instantiation
print("\n[1/5] Model instantiation...")
model = create_model(architecture, input_size=n_features)
x = torch.randn(32, seq_len, n_features).to(device)
out = model(x)
assert out.shape == (32, 1), f"Output shape wrong: {out.shape}"
print(f" Input: (32, {seq_len}, {n_features}) -> Output: {out.shape}")
print(" Status: PASS")
# [2/5] Gradient flow
print("\n[2/5] Gradient flow...")
y = torch.randn(32, 1).to(device)
loss = F.mse_loss(out, y)
loss.backward()
grad_norms = [p.grad.norm().item() for p in model.parameters() if p.grad is not None]
assert len(grad_norms) > 0, "No gradients!"
assert all(np.isfinite(g) for g in grad_norms), "NaN/Inf gradients!"
print(f" Max grad norm: {max(grad_norms):.4f}")
print(" Status: PASS")
# [3/5] NDJSON artifact validation
print("