Back to Skills

axiom-vision

verified

subject segmentation, VNGenerateForegroundInstanceMaskRequest, isolate object from hand, VisionKit subject lifting, image foreground detection, instance masks, class-agnostic segmentation, VNRecognizeTextRequest, OCR, VNDetectBarcodesRequest, DataScannerViewController, document scanning, RecognizeDocumentsRequest

View on GitHub

Marketplace

axiom-marketplace

CharlesWiltgen/Axiom

Plugin

axiom

Repository

CharlesWiltgen/Axiom
289stars

.claude-plugin/plugins/axiom/skills/axiom-vision/SKILL.md

Last Verified

January 16, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/CharlesWiltgen/Axiom/blob/main/.claude-plugin/plugins/axiom/skills/axiom-vision/SKILL.md -a claude-code --skill axiom-vision

Installation paths:

Claude
.claude/skills/axiom-vision/
Powered by add-skill CLI

Instructions

# Vision Framework Computer Vision

Guides you through implementing computer vision: subject segmentation, hand/body pose detection, person detection, text recognition, barcode detection, document scanning, and combining Vision APIs to solve complex problems.

## When to Use This Skill

Use when you need to:
- ☑ Isolate subjects from backgrounds (subject lifting)
- ☑ Detect and track hand poses for gestures
- ☑ Detect and track body poses for fitness/action classification
- ☑ Segment multiple people separately
- ☑ Exclude hands from object bounding boxes (combining APIs)
- ☑ Choose between VisionKit and Vision framework
- ☑ Combine Vision with CoreImage for compositing
- ☑ Decide which Vision API solves your problem
- ☑ Recognize text in images (OCR)
- ☑ Detect barcodes and QR codes
- ☑ Scan documents with perspective correction
- ☑ Extract structured data from documents (iOS 26+)
- ☑ Build live scanning experiences (DataScannerViewController)

## Example Prompts

"How do I isolate a subject from the background?"
"I need to detect hand gestures like pinch"
"How can I get a bounding box around an object **without including the hand holding it**?"
"Should I use VisionKit or Vision framework for subject lifting?"
"How do I segment multiple people separately?"
"I need to detect body poses for a fitness app"
"How do I preserve HDR when compositing subjects on new backgrounds?"
"How do I recognize text in an image?"
"I need to scan QR codes from camera"
"How do I extract data from a receipt?"
"Should I use DataScannerViewController or Vision directly?"
"How do I scan documents and correct perspective?"
"I need to extract table data from a document"

## Red Flags

Signs you're making this harder than it needs to be:
- ❌ Manually implementing subject segmentation with CoreML models
- ❌ Using ARKit just for body pose (Vision works offline)
- ❌ Writing gesture recognition from scratch (use hand pose + simple distance checks)
- ❌ Processing on main thread (blocks UI - Vision i

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
30896 chars