Back to Skills

tensorflow-data-pipelines

verified

Create efficient data pipelines with tf.data

View on GitHub

Marketplace

han

TheBushidoCollective/han

Plugin

jutsu-tensorflow

Technique

Repository

TheBushidoCollective/han
60stars

jutsu/jutsu-tensorflow/skills/tensorflow-data-pipelines/SKILL.md

Last Verified

January 24, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/TheBushidoCollective/han/blob/main/jutsu/jutsu-tensorflow/skills/tensorflow-data-pipelines/SKILL.md -a claude-code --skill tensorflow-data-pipelines

Installation paths:

Claude
.claude/skills/tensorflow-data-pipelines/
Powered by add-skill CLI

Instructions

# TensorFlow Data Pipelines

Build efficient, scalable data pipelines using the tf.data API for optimal training performance. This skill covers dataset creation, transformations, batching, shuffling, prefetching, and advanced optimization techniques to maximize GPU/TPU utilization.

## Dataset Creation

### From Tensor Slices

```python
import tensorflow as tf
import numpy as np

# Create dataset from numpy arrays
x_train = np.random.rand(1000, 28, 28, 1)
y_train = np.random.randint(0, 10, 1000)

# Method 1: from_tensor_slices
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))

# Apply transformations
dataset = dataset.shuffle(buffer_size=1024)
dataset = dataset.batch(32)
dataset = dataset.prefetch(tf.data.AUTOTUNE)

# Iterate through dataset
for batch_x, batch_y in dataset.take(2):
    print(f"Batch shape: {batch_x.shape}, Labels shape: {batch_y.shape}")
```

### From Generator Functions

```python
def data_generator():
    """Generator function for custom data loading."""
    for i in range(1000):
        # Simulate loading data from disk or API
        x = np.random.rand(28, 28, 1).astype(np.float32)
        y = np.random.randint(0, 10)
        yield x, y

# Create dataset from generator
dataset = tf.data.Dataset.from_generator(
    data_generator,
    output_signature=(
        tf.TensorSpec(shape=(28, 28, 1), dtype=tf.float32),
        tf.TensorSpec(shape=(), dtype=tf.int32)
    )
)

dataset = dataset.batch(32).prefetch(tf.data.AUTOTUNE)
```

### From Dataset Range

```python
# Create simple range dataset
dataset = tf.data.Dataset.range(1000)

# Use with custom mapping
dataset = dataset.map(lambda x: (tf.random.normal([28, 28, 1]), x % 10))
dataset = dataset.batch(32)
```

## Data Transformation

### Normalization Pipeline

```python
def normalize(image, label):
    """Normalize pixel values."""
    image = tf.cast(image, tf.float32) / 255.0
    return image, label

# Apply normalization
train_dataset = (
    tf.data.Dataset.from_tensor_slices((x_

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
16590 chars