Back to Skills

k8s-debug

verified

Comprehensive Kubernetes debugging and troubleshooting toolkit. Use this skill when diagnosing Kubernetes cluster issues, debugging failing pods, investigating network connectivity problems, analyzing resource usage, troubleshooting deployments, or performing cluster health checks.

View on GitHub

Marketplace

akin-ozer

akin-ozer/cc-devops-skills

Plugin

devops-skills

Repository

akin-ozer/cc-devops-skills
48stars

devops-skills-plugin/skills/k8s-debug/SKILL.md

Last Verified

February 1, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/akin-ozer/cc-devops-skills/blob/main/devops-skills-plugin/skills/k8s-debug/SKILL.md -a claude-code --skill k8s-debug

Installation paths:

Claude
.claude/skills/k8s-debug/
Powered by add-skill CLI

Instructions

# Kubernetes Debugging Skill

## Overview

Systematic toolkit for debugging and troubleshooting Kubernetes clusters, pods, services, and deployments. Provides scripts, workflows, and reference guides for identifying and resolving common Kubernetes issues efficiently.

## When to Use This Skill

Invoke this skill when encountering:
- Pod failures (CrashLoopBackOff, ImagePullBackOff, Pending, OOMKilled)
- Service connectivity or DNS resolution issues
- Network policy or ingress problems
- Volume and storage mount failures
- Deployment rollout issues
- Cluster health or performance degradation
- Resource exhaustion (CPU/memory)
- Configuration problems (ConfigMaps, Secrets, RBAC)

## Debugging Workflow

Follow this systematic approach for any Kubernetes issue:

### 1. Identify the Problem Layer

Categorize the issue:
- **Application Layer**: Application crashes, errors, bugs
- **Pod Layer**: Pod not starting, restarting, or pending
- **Service Layer**: Network connectivity, DNS issues
- **Node Layer**: Node not ready, resource exhaustion
- **Cluster Layer**: Control plane issues, API problems
- **Storage Layer**: Volume mount failures, PVC issues
- **Configuration Layer**: ConfigMap, Secret, RBAC issues

### 2. Gather Diagnostic Information

Use the appropriate diagnostic script based on scope:

#### Pod-Level Diagnostics
Use `scripts/pod_diagnostics.py` for comprehensive pod analysis:

```bash
python3 scripts/pod_diagnostics.py <pod-name> -n <namespace>
```

This script gathers:
- Pod status and description
- Pod events
- Container logs (current and previous)
- Resource usage
- Node information
- YAML configuration

Output can be saved for analysis: `python3 scripts/pod_diagnostics.py <pod-name> -n <namespace> -o diagnostics.txt`

#### Cluster-Level Health Check
Use `scripts/cluster_health.sh` for overall cluster diagnostics:

```bash
./scripts/cluster_health.sh
```

This script checks:
- Cluster info and version
- Node status and resources
- Pods across all namespace

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
9673 chars