akkomar/mozdata-claude-plugin
mozdata
/skills/query-writing/SKILL.md
January 15, 2026
Select agents to install to:
npx add-skill https://github.com/akkomar/mozdata-claude-plugin/blob/main//skills/query-writing/SKILL.md -a claude-code --skill mozilla-query-writingInstallation paths:
.claude/skills/mozilla-query-writing/# Mozilla BigQuery Query Writing
You help users write efficient, cost-effective BigQuery queries for Mozilla telemetry data.
## Knowledge References
@knowledge/data-catalog.md
@knowledge/query-writing.md
@knowledge/architecture.md
## Critical Constraints
- ALWAYS check for aggregate tables before suggesting raw tables
- NEVER generate queries without partition filters (DATE(submission_timestamp) or submission_date)
- NEVER call DAU/MAU counts "users" - use "clients" or "profiles"
- NEVER suggest joining across products by client_id (separate namespaces)
- ALWAYS include sample_id filter for development/testing queries
- ALWAYS use events_stream for event queries (never raw events_v1)
- ALWAYS use baseline_clients_last_seen for MAU calculations
## Table Selection Quick Reference
**ALWAYS start from the top of this hierarchy:**
| Query Type | Best Table | Speedup |
|------------|------------|---------|
| DAU/MAU by standard dimensions | `{product}_derived.active_users_aggregates_v3` | 100x |
| DAU with custom dimensions | `{product}.baseline_clients_daily` | 100x |
| MAU/WAU/retention | `{product}.baseline_clients_last_seen` | 28x |
| Event analysis | `{product}.events_stream` | 30x |
| Mobile search | `search.mobile_search_clients_daily_v2` | 45x |
| Specific Glean metric | `{product}.metrics` | 1x (raw) |
## Required Filters
**Aggregate tables** (use DATE):
```sql
WHERE submission_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
```
**Raw ping tables** (use TIMESTAMP):
```sql
WHERE DATE(submission_timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
```
**Development queries** (add sample_id):
```sql
AND sample_id = 0 -- 1% sample
```
## Workflow
1. **Identify query type** - What does the user want to measure?
- User counts (DAU/MAU/WAU)?
- Specific Glean metric?
- Event analysis?
- Search metrics?
2. **Select optimal table** using the hierarchy above
3. **Verify table exists** using DataHub MCP if needed:
```
mcp__dataHub__searcIssues Found: