Complete llama.cpp C/C++ API reference covering model loading, inference, text generation, embeddings, chat, tokenization, sampling, batching, KV cache, LoRA adapters, and state management. Triggers on: llama.cpp questions, LLM inference code, GGUF models, local AI/ML inference, C/C++ LLM integration, \"how do I use llama.cpp\", API function lookups, implementation questions, troubleshooting llama.cpp issues, and any llama-cpp or ggerganov/llama.cpp mentions.
View on GitHubSelect agents to install to:
npx add-skill https://github.com/datathings/marketplace/blob/main/plugins/llamacpp/skills/llamacpp/SKILL.md -a claude-code --skill llamacppInstallation paths:
.claude/skills/llamacpp/# llama.cpp C API Guide Comprehensive reference for the llama.cpp C API, documenting all non-deprecated functions and common usage patterns. ## Overview llama.cpp is a C/C++ implementation for LLM inference with minimal dependencies and state-of-the-art performance. This skill provides: - **Complete API Reference**: All non-deprecated functions organized by category - **Common Workflows**: Working examples for typical use cases - **Best Practices**: Patterns for efficient and correct API usage ## Quick Start See **[references/workflows.md](references/workflows.md)** for complete working examples. Basic workflow: 1. `llama_backend_init()` - Initialize backend 2. `llama_model_load_from_file()` - Load model 3. `llama_init_from_model()` - Create context 4. `llama_tokenize()` - Convert text to tokens 5. `llama_decode()` - Process tokens 6. `llama_sampler_sample()` - Sample next token 7. Cleanup in reverse order ## When to Use This Skill Use this skill when: 1. **API Lookup**: You need to find a specific function (e.g., "How do I load a model?", "What function creates a context?") 2. **Code Generation**: You're writing C code that uses llama.cpp 3. **Workflow Guidance**: You need to understand the steps for a task (e.g., text generation, embeddings, chat) 4. **Advanced Features**: You're working with batches, sequences, LoRA adapters, state management, or custom sampling 5. **Migration**: You're updating code from deprecated functions to current API ## Core Concepts ### Key Objects - **`llama_model`**: Loaded model weights and architecture - **`llama_context`**: Inference state (KV cache, compute buffers) - **`llama_batch`**: Input tokens and positions for processing - **`llama_sampler`**: Token sampling configuration - **`llama_vocab`**: Vocabulary and tokenizer - **`llama_memory_t`**: KV cache memory handle ### Typical Flow 1. **Initialize**: `llama_backend_init()` 2. **Load Model**: `llama_model_load_from_file()` 3. **Create Context**: `llama_init_from_m