DataFrame schema validation using pandera. Schema definitions, column checks, and decorator-based validation.
View on GitHubmajesticlabs-dev/majestic-marketplace
majestic-data
plugins/majestic-data/skills/pandera-validation/SKILL.md
January 24, 2026
Select agents to install to:
npx add-skill https://github.com/majesticlabs-dev/majestic-marketplace/blob/main/plugins/majestic-data/skills/pandera-validation/SKILL.md -a claude-code --skill pandera-validationInstallation paths:
.claude/skills/pandera-validation/# Pandera Validation
**Audience:** Data engineers validating pandas DataFrames.
**Goal:** Provide pandera patterns for schema validation and type checking.
## Scripts
Execute schema functions from `scripts/schemas.py`:
```python
from scripts.schemas import (
create_user_schema,
create_nullable_schema,
create_date_range_schema,
UserSchema,
validate_with_errors,
infer_and_export_schema
)
```
## Usage Examples
### Basic Schema Validation
```python
from scripts.schemas import create_user_schema
schema = create_user_schema()
validated_df = schema.validate(df)
```
### Collect All Errors
```python
from scripts.schemas import create_user_schema, validate_with_errors
schema = create_user_schema()
validated_df, errors = validate_with_errors(df, schema)
if errors:
for err in errors:
print(f"{err['column']}: {err['check']} - {err['failure_case']}")
```
### Class-Based Schema
```python
from scripts.schemas import UserSchema
# Validate with type hints
UserSchema.validate(df)
# Use as function type hint
def process_users(df: pa.typing.DataFrame[UserSchema]) -> pd.DataFrame:
return df.query("status == 'active'")
```
### Infer Schema from DataFrame
```python
from scripts.schemas import infer_and_export_schema
schema_export = infer_and_export_schema(df)
print(schema_export['python_code']) # Python schema definition
print(schema_export['yaml']) # YAML schema
```
## Built-in Checks Reference
| Check Type | Example | Description |
|------------|---------|-------------|
| Numeric | `Check.gt(0)`, `Check.in_range(0, 100)` | Comparisons |
| String | `Check.str_matches(r'pattern')` | Regex match |
| Set membership | `Check.isin(['A', 'B'])` | Allowed values |
| Uniqueness | `unique=True` on Column | No duplicates |
| Nullable | `nullable=True` on Column | Allow nulls |
## Decorator-Based Validation
```python
import pandera as pa
@pa.check_output(schema)
def load_data(path: str) -> pd.DataFrame:
return pd.read_csv(