Back to Skills

llamacpp

verified

Complete llama.cpp C/C++ API reference covering model loading, inference, text generation, embeddings, chat, tokenization, sampling, batching, KV cache, LoRA adapters, and state management. Triggers on: llama.cpp questions, LLM inference code, GGUF models, local AI/ML inference, C/C++ LLM integration, \"how do I use llama.cpp\", API function lookups, implementation questions, troubleshooting llama.cpp issues, and any llama-cpp or ggerganov/llama.cpp mentions.

View on GitHub

Marketplace

datathings

datathings/marketplace

Plugin

llamacpp

Repository

datathings/marketplace
4stars

plugins/llamacpp/skills/llamacpp/SKILL.md

Last Verified

January 22, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/datathings/marketplace/blob/main/plugins/llamacpp/skills/llamacpp/SKILL.md -a claude-code --skill llamacpp

Installation paths:

Claude
.claude/skills/llamacpp/
Powered by add-skill CLI

Instructions

# llama.cpp C API Guide

Comprehensive reference for the llama.cpp C API, documenting all non-deprecated functions and common usage patterns.

## Overview

llama.cpp is a C/C++ implementation for LLM inference with minimal dependencies and state-of-the-art performance. This skill provides:

- **Complete API Reference**: All non-deprecated functions organized by category
- **Common Workflows**: Working examples for typical use cases
- **Best Practices**: Patterns for efficient and correct API usage

## Quick Start

See **[references/workflows.md](references/workflows.md)** for complete working examples. Basic workflow:

1. `llama_backend_init()` - Initialize backend
2. `llama_model_load_from_file()` - Load model
3. `llama_init_from_model()` - Create context
4. `llama_tokenize()` - Convert text to tokens
5. `llama_decode()` - Process tokens
6. `llama_sampler_sample()` - Sample next token
7. Cleanup in reverse order

## When to Use This Skill

Use this skill when:

1. **API Lookup**: You need to find a specific function (e.g., "How do I load a model?", "What function creates a context?")
2. **Code Generation**: You're writing C code that uses llama.cpp
3. **Workflow Guidance**: You need to understand the steps for a task (e.g., text generation, embeddings, chat)
4. **Advanced Features**: You're working with batches, sequences, LoRA adapters, state management, or custom sampling
5. **Migration**: You're updating code from deprecated functions to current API

## Core Concepts

### Key Objects

- **`llama_model`**: Loaded model weights and architecture
- **`llama_context`**: Inference state (KV cache, compute buffers)
- **`llama_batch`**: Input tokens and positions for processing
- **`llama_sampler`**: Token sampling configuration
- **`llama_vocab`**: Vocabulary and tokenizer
- **`llama_memory_t`**: KV cache memory handle

### Typical Flow

1. **Initialize**: `llama_backend_init()`
2. **Load Model**: `llama_model_load_from_file()`
3. **Create Context**: `llama_init_from_m

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
7843 chars