Use to audit test quality with Google Fellow SRE scrutiny - identifies tautological tests, coverage gaming, weak assertions, missing corner cases. Creates bd epic with tasks for improvements, then runs SRE task refinement on each.
View on GitHubwithzombies/hyperpowers
withzombies-hyper
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/withzombies/hyperpowers/blob/main/skills/analyzing-test-effectiveness/SKILL.md -a claude-code --skill analyzing-test-effectivenessInstallation paths:
.claude/skills/analyzing-test-effectiveness/<skill_overview> Audit test suites for real effectiveness, not vanity metrics. Identify tests that provide false confidence (tautological, mock-testing, line hitters) and missing corner cases. Create bd epic with tracked tasks for improvements. Run SRE task refinement on each task before execution. **CRITICAL MINDSET: Assume tests were written by junior engineers optimizing for coverage metrics.** Default to skeptical—a test is RED or YELLOW until proven GREEN. You MUST read production code before categorizing tests. GREEN is the exception, not the rule. </skill_overview> <rigidity_level> MEDIUM FREEDOM - Follow the 5-phase analysis process exactly. Categorization criteria (RED/YELLOW/GREEN) are rigid. Corner case discovery adapts to the specific codebase. Output format is flexible but must include all sections. </rigidity_level> <quick_reference> | Phase | Action | Output | |-------|--------|--------| | 1. Inventory | List all test files and functions | Test catalog | | 2. Read Production Code | Read the actual code each test claims to test | Context for analysis | | 3. Trace Call Paths | Verify tests exercise production, not mocks/utilities | Call path verification | | 4. Categorize (Skeptical) | Apply RED/YELLOW/GREEN - default to harsher rating | Categorized tests | | 5. Self-Review | Challenge every GREEN - would a senior SRE agree? | Validated categories | | 6. Corner Cases | Identify missing edge cases per module | Gap analysis | | 7. Prioritize | Rank by business criticality | Priority matrix | | 8. bd Issues | Create epic + tasks, run SRE refinement | Tracked improvement plan | **MANDATORY: Read production code BEFORE categorizing tests. You cannot assess a test without understanding what it claims to test.** **Core Questions for Each Test:** 1. What bug would this catch? (If you can't name one → RED) 2. Does it exercise PRODUCTION code or a mock/test utility? (Mock → RED or YELLOW) 3. Could code break while test passes? (If yes → YELLOW or RED) 4. Mea