Use before skill deployment to verify pressure resistance via TDD RED-GREEN-REFACTOR cycle with Serena metrics tracking - measures compliance score (0.00-1.00) across pressure scenarios
View on GitHubkrzemienski/shannon-framework
shannon
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/krzemienski/shannon-framework/blob/main/skills/testing-skills-with-subagents/SKILL.md -a claude-code --skill testing-skills-with-subagentsInstallation paths:
.claude/skills/testing-skills-with-subagents/# Testing Skills With Subagents (Shannon-Enhanced) ## Overview **TDD for process documentation with quantitative metrics.** Red-Green-Refactor applies to skills: (1) Run baseline WITHOUT skill, (2) Write skill addressing failures, (3) Close loopholes until bulletproof. Shannon enhancement adds Serena metrics tracking for compliance scoring. **Compliance Scoring (0.00-1.00):** - 0.00-0.60: Fails under pressure, multiple rationalizations - 0.61-0.85: Mostly compliant, 1-2 new loopholes - 0.86-0.99: Bulletproof, rare exceptions - 1.00: Perfect compliance across all scenarios ## When to Use Test skills enforcing discipline (TDD, code review, testing) that: - Have compliance costs (time, effort) - Could be rationalized away - Contradict immediate goals (speed over quality) **Don't test:** Reference skills, pure documentation, skills without rules. ## TDD Cycle with Metrics | Phase | Action | Shannon Metric | |-------|--------|-----------------| | **RED** | Run scenario WITHOUT skill | baseline_failures: count violations | | **GREEN** | Write skill, run WITH skill | compliance_score: % correct choices | | **REFACTOR** | Close loopholes | loophole_count: new rationalizations found | | **Verify** | Re-test scenarios | final_score: 0.00-1.00 | ## RED Phase: Baseline (Watch It Fail) ```bash # Serena metric: baseline_failures - Run pressure scenarios WITHOUT skill - Document rationalizations verbatim - Log failure patterns to Serena pressure_type: "sunk_cost|time|authority|exhaustion|social" rationalization: "exact words agent used" scenario_complexity: 1-3 pressures ``` **Pressure types to combine:** - Time (deadline, window closing) - Sunk cost (hours invested) - Authority (senior says skip) - Exhaustion (end of day) - Social (seeming dogmatic) ## GREEN Phase: Write Minimal Skill Write skill addressing ONLY observed baseline failures. Run same scenarios WITH skill. Document: ``` compliance_score = (correct_choices / total_scenarios) * 1.0 Range: 0.00-1.0