Routes to appropriate deep-RL skills based on problem type and algorithm family
View on GitHubtachyon-beep/skillpacks
yzmir-deep-rl
January 24, 2026
Select agents to install to:
npx add-skill https://github.com/tachyon-beep/skillpacks/blob/main/plugins/yzmir-deep-rl/skills/using-deep-rl/SKILL.md -a claude-code --skill using-deep-rlInstallation paths:
.claude/skills/using-deep-rl/# Using Deep RL Meta-Skill ## When to Use This Skill Invoke this meta-skill when you encounter: - **RL Implementation**: Implementing reinforcement learning algorithms (Q-learning, DQN, PPO, SAC, etc.) - **Agent Training**: Training agents in environments (games, robotics, control systems) - **Sequential Decision-Making**: Problems requiring learning from trial and error - **Policy Optimization**: Learning policies that maximize cumulative rewards - **RL Debugging**: Debugging training issues, agents not learning, reward problems This is the **entry point** for the deep-rl pack. It routes to 13 specialized skills based on problem characteristics. ## How to Access Reference Sheets **IMPORTANT**: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file. When this skill is loaded from: `skills/using-deep-rl/SKILL.md` Reference sheets like `rl-foundations.md` are at: `skills/using-deep-rl/rl-foundations.md` NOT at: `skills/rl-foundations.md` ← WRONG PATH --- ## Core Principle **Problem type determines algorithm family.** The correct approach depends on: 1. **Action Space**: Discrete (button presses) vs Continuous (joint angles) 2. **Data Regime**: Online (interact with environment) vs Offline (fixed dataset) 3. **Experience Level**: Need foundations vs ready to implement 4. **Special Requirements**: Multi-agent, model-based, exploration, reward design **Always clarify the problem BEFORE suggesting algorithms.** --- ## The 13 Deep RL Skills 1. **rl-foundations** - MDP formulation, Bellman equations, value vs policy basics 2. **value-based-methods** - Q-learning, DQN, Double DQN, Dueling DQN, Rainbow 3. **policy-gradient-methods** - REINFORCE, PPO, TRPO, policy optimization 4. **actor-critic-methods** - A2C, A3C, SAC, TD3, advantage functions 5. **model-based-rl** - World models, Dyna, MBPO, planning with learned models 6. **offline-rl** - Batch RL, CQL, IQL, learning from fixed datasets 7. **multi-agent-rl** - MARL, cooperative