jeremylongshore/claude-code-plugins-plus-skills
langchain-pack
plugins/saas-packs/langchain-pack/skills/langchain-performance-tuning/SKILL.md
January 22, 2026
Select agents to install to:
npx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/langchain-pack/skills/langchain-performance-tuning/SKILL.md -a claude-code --skill langchain-performance-tuningInstallation paths:
.claude/skills/langchain-performance-tuning/# LangChain Performance Tuning
## Overview
Optimize LangChain applications for lower latency, higher throughput, and efficient resource utilization.
## Prerequisites
- Working LangChain application
- Performance baseline measurements
- Profiling tools available
## Instructions
### Step 1: Measure Baseline Performance
```python
import time
from functools import wraps
from typing import Callable
import statistics
def benchmark(func: Callable, iterations: int = 10):
"""Benchmark a function's performance."""
times = []
for _ in range(iterations):
start = time.perf_counter()
func()
elapsed = time.perf_counter() - start
times.append(elapsed)
return {
"mean": statistics.mean(times),
"median": statistics.median(times),
"stdev": statistics.stdev(times) if len(times) > 1 else 0,
"min": min(times),
"max": max(times),
}
# Usage
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
def test_call():
llm.invoke("Hello!")
results = benchmark(test_call, iterations=5)
print(f"Mean latency: {results['mean']:.3f}s")
```
### Step 2: Enable Response Caching
```python
from langchain_core.globals import set_llm_cache
from langchain_community.cache import InMemoryCache, SQLiteCache, RedisCache
# Option 1: In-memory cache (single process)
set_llm_cache(InMemoryCache())
# Option 2: SQLite cache (persistent, single node)
set_llm_cache(SQLiteCache(database_path=".langchain_cache.db"))
# Option 3: Redis cache (distributed, production)
import redis
redis_client = redis.Redis.from_url("redis://localhost:6379")
set_llm_cache(RedisCache(redis_client))
# Cache hit = ~0ms latency vs ~500-2000ms for API call
```
### Step 3: Optimize Batch Processing
```python
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("{input}")
chain = prom