Create efficient data pipelines with tf.data
View on GitHubTheBushidoCollective/han
jutsu-tensorflow
January 24, 2026
Select agents to install to:
npx add-skill https://github.com/TheBushidoCollective/han/blob/main/jutsu/jutsu-tensorflow/skills/tensorflow-data-pipelines/SKILL.md -a claude-code --skill tensorflow-data-pipelinesInstallation paths:
.claude/skills/tensorflow-data-pipelines/# TensorFlow Data Pipelines
Build efficient, scalable data pipelines using the tf.data API for optimal training performance. This skill covers dataset creation, transformations, batching, shuffling, prefetching, and advanced optimization techniques to maximize GPU/TPU utilization.
## Dataset Creation
### From Tensor Slices
```python
import tensorflow as tf
import numpy as np
# Create dataset from numpy arrays
x_train = np.random.rand(1000, 28, 28, 1)
y_train = np.random.randint(0, 10, 1000)
# Method 1: from_tensor_slices
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# Apply transformations
dataset = dataset.shuffle(buffer_size=1024)
dataset = dataset.batch(32)
dataset = dataset.prefetch(tf.data.AUTOTUNE)
# Iterate through dataset
for batch_x, batch_y in dataset.take(2):
print(f"Batch shape: {batch_x.shape}, Labels shape: {batch_y.shape}")
```
### From Generator Functions
```python
def data_generator():
"""Generator function for custom data loading."""
for i in range(1000):
# Simulate loading data from disk or API
x = np.random.rand(28, 28, 1).astype(np.float32)
y = np.random.randint(0, 10)
yield x, y
# Create dataset from generator
dataset = tf.data.Dataset.from_generator(
data_generator,
output_signature=(
tf.TensorSpec(shape=(28, 28, 1), dtype=tf.float32),
tf.TensorSpec(shape=(), dtype=tf.int32)
)
)
dataset = dataset.batch(32).prefetch(tf.data.AUTOTUNE)
```
### From Dataset Range
```python
# Create simple range dataset
dataset = tf.data.Dataset.range(1000)
# Use with custom mapping
dataset = dataset.map(lambda x: (tf.random.normal([28, 28, 1]), x % 10))
dataset = dataset.batch(32)
```
## Data Transformation
### Normalization Pipeline
```python
def normalize(image, label):
"""Normalize pixel values."""
image = tf.cast(image, tf.float32) / 255.0
return image, label
# Apply normalization
train_dataset = (
tf.data.Dataset.from_tensor_slices((x_