Back to Skills

testing-skills-with-subagents

verified

Use before skill deployment to verify pressure resistance via TDD RED-GREEN-REFACTOR cycle with Serena metrics tracking - measures compliance score (0.00-1.00) across pressure scenarios

View on GitHub

Marketplace

shannon-framework

krzemienski/shannon-framework

Plugin

shannon

Repository

krzemienski/shannon-framework
1stars

skills/testing-skills-with-subagents/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/krzemienski/shannon-framework/blob/main/skills/testing-skills-with-subagents/SKILL.md -a claude-code --skill testing-skills-with-subagents

Installation paths:

Claude
.claude/skills/testing-skills-with-subagents/
Powered by add-skill CLI

Instructions

# Testing Skills With Subagents (Shannon-Enhanced)

## Overview

**TDD for process documentation with quantitative metrics.**

Red-Green-Refactor applies to skills: (1) Run baseline WITHOUT skill, (2) Write skill addressing failures, (3) Close loopholes until bulletproof. Shannon enhancement adds Serena metrics tracking for compliance scoring.

**Compliance Scoring (0.00-1.00):**
- 0.00-0.60: Fails under pressure, multiple rationalizations
- 0.61-0.85: Mostly compliant, 1-2 new loopholes
- 0.86-0.99: Bulletproof, rare exceptions
- 1.00: Perfect compliance across all scenarios

## When to Use

Test skills enforcing discipline (TDD, code review, testing) that:
- Have compliance costs (time, effort)
- Could be rationalized away
- Contradict immediate goals (speed over quality)

**Don't test:** Reference skills, pure documentation, skills without rules.

## TDD Cycle with Metrics

| Phase | Action | Shannon Metric |
|-------|--------|-----------------|
| **RED** | Run scenario WITHOUT skill | baseline_failures: count violations |
| **GREEN** | Write skill, run WITH skill | compliance_score: % correct choices |
| **REFACTOR** | Close loopholes | loophole_count: new rationalizations found |
| **Verify** | Re-test scenarios | final_score: 0.00-1.00 |

## RED Phase: Baseline (Watch It Fail)

```bash
# Serena metric: baseline_failures
- Run pressure scenarios WITHOUT skill
- Document rationalizations verbatim
- Log failure patterns to Serena
  pressure_type: "sunk_cost|time|authority|exhaustion|social"
  rationalization: "exact words agent used"
  scenario_complexity: 1-3 pressures
```

**Pressure types to combine:**
- Time (deadline, window closing)
- Sunk cost (hours invested)
- Authority (senior says skip)
- Exhaustion (end of day)
- Social (seeming dogmatic)

## GREEN Phase: Write Minimal Skill

Write skill addressing ONLY observed baseline failures.

Run same scenarios WITH skill. Document:
```
compliance_score = (correct_choices / total_scenarios) * 1.0
Range: 0.00-1.0

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
4615 chars