LLM cost tracking with Langfuse for cached responses. Use when monitoring cache effectiveness, tracking cost savings, or attributing costs to agents in multi-agent systems.
View on GitHubyonatangross/skillforge-claude-plugin
orchestkit-complete
January 23, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/skillforge-claude-plugin/blob/main/./skills/cache-cost-tracking/SKILL.md -a claude-code --skill cache-cost-trackingInstallation paths:
.claude/skills/cache-cost-tracking/# Cache Cost Tracking
Monitor LLM costs and cache effectiveness.
## Langfuse Automatic Tracking
```python
from langfuse.decorators import observe, langfuse_context
@observe(as_type="generation")
async def call_llm_with_cache(
prompt: str,
agent_type: str,
analysis_id: UUID
) -> str:
"""LLM call with automatic cost tracking."""
# Link to parent trace
langfuse_context.update_current_trace(
name=f"{agent_type}_generation",
session_id=str(analysis_id)
)
# Check caches
if cache_key in lru_cache:
langfuse_context.update_current_observation(
metadata={"cache_layer": "L1", "cache_hit": True}
)
return lru_cache[cache_key]
similar = await semantic_cache.get(prompt, agent_type)
if similar:
langfuse_context.update_current_observation(
metadata={"cache_layer": "L2", "cache_hit": True}
)
return similar
# LLM call - Langfuse tracks tokens/cost automatically
response = await llm.generate(prompt)
langfuse_context.update_current_observation(
metadata={
"cache_layer": "L4",
"cache_hit": False,
"prompt_cache_hit": response.usage.cache_read_input_tokens > 0
}
)
return response.content
```
## Hierarchical Cost Rollup
```python
class AnalysisWorkflow:
@observe(as_type="trace")
async def run_analysis(self, url: str, analysis_id: UUID):
"""Parent trace aggregates child costs.
Trace Hierarchy:
run_analysis (trace)
├── security_agent (generation)
├── tech_agent (generation)
└── synthesis (generation)
"""
langfuse_context.update_current_trace(
name="content_analysis",
session_id=str(analysis_id),
tags=["multi-agent"]
)
for agent in self.agents:
await self.run_agent(agent, content, analysis_id)
@observe(as_type="generation")
async def run