Use when a dbt Cloud/platform job fails and you need to diagnose the root cause, especially when error messages are unclear or when intermittent failures occur. Do not use for local dbt development errors.
View on GitHubFebruary 1, 2026
Select agents to install to:
npx add-skill https://github.com/dbt-labs/dbt-agent-skills/blob/main/skills/troubleshooting-dbt-job-errors/SKILL.md -a claude-code --skill troubleshooting-dbt-job-errorsInstallation paths:
.claude/skills/troubleshooting-dbt-job-errors/# Troubleshooting dbt Job Errors
Systematically diagnose and resolve dbt Cloud job failures using available MCP tools, CLI commands, and data investigation.
## When to Use
- dbt Cloud / dbt platform job failed and you need to find the root cause
- Intermittent job failures that are hard to reproduce
- Error messages that don't clearly indicate the problem
- Post-merge failures where a recent change may have caused the issue
**Not for:** Local dbt development errors - use the skill `using-dbt-for-analytics-engineering` instead
## The Iron Rule
**Never modify a test to make it pass without understanding why it's failing.**
A failing test is evidence of a problem. Changing the test to pass hides the problem. Investigate the root cause first.
## Rationalizations That Mean STOP
| You're Thinking... | Reality |
|-------------------|---------|
| "Just make the test pass" | The test is telling you something is wrong. Investigate first. |
| "There's a board meeting in 2 hours" | Rushing to a fix without diagnosis creates bigger problems. |
| "We've already spent 2 days on this" | Sunk cost doesn't justify skipping proper diagnosis. |
| "I'll just update the accepted values" | Are the new values valid business data or bugs? Verify first. |
| "It's probably just a flaky test" | "Flaky" means there's an overall issue. Find it. We don't allow flaky tests to stay. |
## Workflow
```mermaid
flowchart TD
A[Job failure reported] --> B{MCP Admin API available?}
B -->|yes| C[Use list_jobs_runs to get history]
B -->|no| D[Ask user for logs and run_results.json]
C --> E[Use get_job_run_error for details]
D --> F[Classify error type]
E --> F
F --> G{Error type?}
G -->|Infrastructure| H[Check warehouse, connections, timeouts]
G -->|Code/Compilation| I[Check git history for recent changes]
G -->|Data/Test Failure| J[Use discovering-data skill to investigate]
H --> K{Root cause found?}
I --> K
J --> K
K -->|yes| L[Create bran