Work with labeled multidimensional arrays for scientific data analysis using Xarray. Use when handling climate data, satellite imagery, oceanographic data, or any multidimensional datasets with coordinates and metadata. Ideal for NetCDF/HDF5 files, time series analysis, and large datasets requiring lazy loading with Dask.
View on GitHubuw-ssec/rse-plugins
scientific-domain-applications
plugins/scientific-domain-applications/skills/xarray-for-multidimensional-data/SKILL.md
January 22, 2026
Select agents to install to:
npx add-skill https://github.com/uw-ssec/rse-plugins/blob/main/plugins/scientific-domain-applications/skills/xarray-for-multidimensional-data/SKILL.md -a claude-code --skill xarray-for-multidimensional-dataInstallation paths:
.claude/skills/xarray-for-multidimensional-data/# Xarray for Multidimensional Data
Master **Xarray**, the powerful library for working with labeled multidimensional arrays in scientific Python. Learn how to efficiently handle complex datasets with multiple dimensions, coordinates, and metadata - from climate data and satellite imagery to experimental measurements and simulations.
**Official Documentation**: https://docs.xarray.dev/
**GitHub**: https://github.com/pydata/xarray
## Quick Reference Card
### Installation & Setup
```bash
# Using pixi (recommended for scientific projects)
pixi add xarray netcdf4 dask
# Using pip
pip install xarray[complete]
# Optional dependencies for specific formats
pixi add zarr h5netcdf scipy bottleneck
# Geospatial extensions (for raster data, CRS handling, reprojection)
pixi add rioxarray xesmf
# DataTree is built into Xarray (no separate installation needed)
```
### Essential Xarray Concepts
```python
import xarray as xr
import numpy as np
# DataArray: Single labeled array
temperature = xr.DataArray(
data=np.random.randn(3, 4),
dims=["time", "location"],
coords={
"time": ["2024-01-01", "2024-01-02", "2024-01-03"],
"location": ["A", "B", "C", "D"]
},
name="temperature"
)
# Dataset: Collection of DataArrays
ds = xr.Dataset({
"temperature": temperature,
"pressure": (["time", "location"], np.random.randn(3, 4))
})
```
### Essential Operations
```python
# Selection by label
ds.sel(time="2024-01-01")
ds.sel(location="A")
# Selection by index
ds.isel(time=0)
# Slicing
ds.sel(time=slice("2024-01-01", "2024-01-02"))
# Aggregation
ds.mean(dim="time")
ds.sum(dim="location")
# Computation
ds["temperature"] + 273.15 # Celsius to Kelvin
ds.groupby("time.month").mean()
# I/O operations
ds.to_netcdf("data.nc")
ds = xr.open_dataset("data.nc")
```
### Quick Decision Tree
```
Working with multidimensional scientific data?
├─ YES → Use Xarray for labeled dimensions
└─ NO → NumPy/Pandas sufficient
Need to track coordinates and metadata