Back to Skills

splitting-datasets

verified
View on GitHub

Marketplace

claude-code-plugins-plus

jeremylongshore/claude-code-plugins-plus-skills

Plugin

dataset-splitter

ai-ml

Repository

jeremylongshore/claude-code-plugins-plus-skills
1.1kstars

plugins/ai-ml/dataset-splitter/skills/splitting-datasets/SKILL.md

Last Verified

January 22, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/ai-ml/dataset-splitter/skills/splitting-datasets/SKILL.md -a claude-code --skill splitting-datasets

Installation paths:

Claude
.claude/skills/splitting-datasets/
Powered by add-skill CLI

Instructions

# Dataset Splitter

This skill provides automated assistance for dataset splitter tasks.

## Overview

This skill automates the process of dividing a dataset into subsets for training, validating, and testing machine learning models. It ensures proper data preparation and facilitates robust model evaluation.

## How It Works

1. **Analyze Request**: The skill analyzes the user's request to determine the dataset to be split and the desired proportions for each subset.
2. **Generate Code**: Based on the request, the skill generates Python code utilizing standard ML libraries to perform the data splitting.
3. **Execute Splitting**: The code is executed to split the dataset into training, validation, and testing sets according to the specified ratios.

## When to Use This Skill

This skill activates when you need to:
- Prepare a dataset for machine learning model training.
- Create training, validation, and testing sets.
- Partition data to evaluate model performance.

## Examples

### Example 1: Splitting a CSV file

User request: "Split the data in 'my_data.csv' into 70% training, 15% validation, and 15% testing sets."

The skill will:
1. Generate Python code to read the 'my_data.csv' file.
2. Execute the code to split the data according to the specified proportions, creating 'train.csv', 'validation.csv', and 'test.csv' files.

### Example 2: Creating a Train-Test Split

User request: "Create a train-test split of 'large_dataset.csv' with an 80/20 ratio."

The skill will:
1. Generate Python code to load 'large_dataset.csv'.
2. Execute the code to split the dataset into 80% training and 20% testing sets, saving them as 'train.csv' and 'test.csv'.

## Best Practices

- **Data Integrity**: Verify that the splitting process maintains the integrity of the data, ensuring no data loss or corruption.
- **Stratification**: Consider stratification when splitting imbalanced datasets to maintain class distributions in each subset.
- **Randomization**: Ensure the splitting proces

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
2779 chars