Kafka operations expert for deployment, monitoring, and tooling. Kubernetes (Strimzi, Confluent), Terraform IaC, Prometheus/Grafana observability, and CLI tools (kcat, kafkactl). Use for Kafka deployment, monitoring setup, or operational tasks.
View on GitHubFebruary 4, 2026
Select agents to install to:
npx add-skill https://github.com/anton-abyzov/specweave/blob/main/plugins/specweave-kafka/skills/kafka-ops/SKILL.md -a claude-code --skill kafka-opsInstallation paths:
.claude/skills/kafka-ops/# Kafka Operations
Expert in Apache Kafka deployment, monitoring, and operational tooling.
## ⚠️ Chunking Rule
Large Kafka infrastructure = 800+ lines. Generate ONE component per response:
1. Deployment → 2. Monitoring → 3. CLI Tools → 4. Automation
## Core Capabilities
### Kubernetes Deployment
- **Strimzi Operator**: Open-source Kafka on K8s
- **Confluent for Kubernetes**: Enterprise Kafka
- **MSK/Confluent Cloud**: Managed services
### Infrastructure as Code
- Terraform modules for Kafka clusters
- AWS MSK, Confluent Cloud, Aiven provisioning
- Network and security configuration
### Observability
- Prometheus metrics (JMX exporter)
- Grafana dashboards for Kafka
- Consumer lag monitoring
- Alert configuration
### CLI Tools
- **kcat**: Swiss army knife for Kafka
- **kafkactl**: Modern CLI for Kafka
- **kafka-console-***: Built-in tools
## Kubernetes Deployment
```yaml
# Strimzi Kafka Cluster
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
storage:
type: persistent-claim
size: 100Gi
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 50Gi
```
## Terraform
```hcl
# AWS MSK Cluster
resource "aws_msk_cluster" "kafka" {
cluster_name = "my-kafka-cluster"
kafka_version = "3.5.1"
number_of_broker_nodes = 3
broker_node_group_info {
instance_type = "kafka.m5.large"
client_subnets = var.private_subnets
security_groups = [aws_security_group.kafka.id]
storage_info {
ebs_storage_info {
volume_size = 100
}
}
}
}
```
## Monitoring
```yaml
# Prometheus scrape config
- job_name: 'kafka'
static_configs:
- targets: ['kafka-1:9404', 'kafka-2:9404', 'kafka-3:9404']
relabel_configs:
- source_labels: [__address__]
target_label: instance
```
Key metrics to monitor:
- `kafka_server_bro