LLM architecture, tokenization, transformers, and inference optimization. Use for understanding and working with language models.
View on GitHubpluginagentmarketplace/custom-plugin-ai-engineer
ai-engineer-plugin
January 20, 2026
Select agents to install to:
npx add-skill https://github.com/pluginagentmarketplace/custom-plugin-ai-engineer/blob/main/skills/llm-basics/SKILL.md -a claude-code --skill llm-basicsInstallation paths:
.claude/skills/llm-basics/# LLM Basics
Master the fundamentals of Large Language Models.
## Quick Start
### Using OpenAI API
```python
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain transformers briefly."}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
```
### Using Hugging Face
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("Hello, how are", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
```
## Core Concepts
### Transformer Architecture
```
Input → Embedding → [N × Transformer Block] → Output
Transformer Block:
┌───────────────────────────┐
│ Multi-Head Self-Attention │
├───────────────────────────┤
│ Layer Normalization │
├───────────────────────────┤
│ Feed-Forward Network │
├───────────────────────────┤
│ Layer Normalization │
└───────────────────────────┘
```
### Tokenization
```python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "Hello, world!"
# Encode
tokens = tokenizer.encode(text)
print(tokens) # [15496, 11, 995, 0]
# Decode
decoded = tokenizer.decode(tokens)
print(decoded) # "Hello, world!"
```
### Key Parameters
```python
# Generation parameters
params = {
'temperature': 0.7, # Randomness (0-2)
'max_tokens': 1000, # Output length limit
'top_p': 0.9, # Nucleus sampling
'top_k': 50, # Top-k sampling
'frequency_penalty': 0, # Reduce repetition
'presence_penalty': 0 # Encourage new topics
}
```
## Model Comparison
| Model | Parameters | Con