Deep-dive data profiling for a specific table. Use when the user asks to profile a table, wants statistics about a dataset, asks about data quality, or needs to understand a table's structure and content. Requires a table name.
View on GitHubSelect agents to install to:
npx add-skill https://github.com/astronomer/agents/blob/main/skills/profiling-tables/SKILL.md -a claude-code --skill profiling-tablesInstallation paths:
.claude/skills/profiling-tables/# Data Profile
Generate a comprehensive profile of a table that a new team member could use to understand the data.
## Step 1: Basic Metadata
Query column metadata:
```sql
SELECT COLUMN_NAME, DATA_TYPE, COMMENT
FROM <database>.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = '<schema>' AND TABLE_NAME = '<table>'
ORDER BY ORDINAL_POSITION
```
If the table name isn't fully qualified, search INFORMATION_SCHEMA.TABLES to locate it first.
## Step 2: Size and Shape
Run via `run_sql`:
```sql
SELECT
COUNT(*) as total_rows,
COUNT(*) / 1000000.0 as millions_of_rows
FROM <table>
```
## Step 3: Column-Level Statistics
For each column, gather appropriate statistics based on data type:
### Numeric Columns
```sql
SELECT
MIN(column_name) as min_val,
MAX(column_name) as max_val,
AVG(column_name) as avg_val,
STDDEV(column_name) as std_dev,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column_name) as median,
SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count,
COUNT(DISTINCT column_name) as distinct_count
FROM <table>
```
### String Columns
```sql
SELECT
MIN(LEN(column_name)) as min_length,
MAX(LEN(column_name)) as max_length,
AVG(LEN(column_name)) as avg_length,
SUM(CASE WHEN column_name IS NULL OR column_name = '' THEN 1 ELSE 0 END) as empty_count,
COUNT(DISTINCT column_name) as distinct_count
FROM <table>
```
### Date/Timestamp Columns
```sql
SELECT
MIN(column_name) as earliest,
MAX(column_name) as latest,
DATEDIFF('day', MIN(column_name), MAX(column_name)) as date_range_days,
SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count
FROM <table>
```
## Step 4: Cardinality Analysis
For columns that look like categorical/dimension keys:
```sql
SELECT
column_name,
COUNT(*) as frequency,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage
FROM <table>
GROUP BY column_name
ORDER BY frequency DESC
LIMIT 20
```
This reveals:
- High-cardinality columns (li