Generates a rigorous experiment design given a hypothesis. Use when asked to design experiments, plan experiments, create an experimental setup, or figure out how to test a research hypothesis. Covers controls, baselines, ablations, metrics, statistical tests, and compute estimates.
View on GitHubskills/experiment-design-checklist/SKILL.md
February 1, 2026
Select agents to install to:
npx add-skill https://github.com/GhostScientist/skills/blob/main/skills/experiment-design-checklist/SKILL.md -a claude-code --skill experiment-design-checklistInstallation paths:
.claude/skills/experiment-design-checklist/# Experiment Design Checklist Prevent the "I ran experiments for 3 months and they're meaningless" disaster through rigorous upfront design. ## The Core Principle Before running ANY experiment, you should be able to answer: 1. What specific claim will this experiment support or refute? 2. What would convince a skeptical reviewer? 3. What could go wrong that would invalidate the results? ## Process ### Step 1: State the Hypothesis Precisely Convert your research question into falsifiable predictions: **Template:** ``` If [intervention/method], then [measurable outcome], because [mechanism]. ``` **Examples:** - "If we add auxiliary contrastive loss, then downstream task accuracy increases by >2%, because representations become more separable." - "If we use learned positional encodings, then performance on sequences >4096 tokens improves, because the model can extrapolate beyond training length." **Null hypothesis:** What does "no effect" look like? This is what you're trying to reject. ### Step 2: Identify Variables **Independent Variables (what you manipulate):** | Variable | Levels | Rationale | |----------|--------|-----------| | [Var 1] | [Level A, B, C] | [Why these levels] | **Dependent Variables (what you measure):** | Metric | How Measured | Why This Metric | |--------|--------------|-----------------| | [Metric 1] | [Procedure] | [Justification] | **Control Variables (what you hold constant):** | Variable | Fixed Value | Why Fixed | |----------|-------------|-----------| | [Var 1] | [Value] | [Prevents confound X] | ### Step 3: Choose Baselines Every experiment needs comparisons. No result is meaningful in isolation. **Baseline Hierarchy:** 1. **Random/Trivial Baseline** - What does random chance achieve? - Sanity check that the task isn't trivial 2. **Simple Baseline** - Simplest reasonable approach - Often embarrassingly effective 3. **Standard Baseline** - Well-known method from literature - Apples-to-apples comparison