Back to Skills

axiom-vision-ref

verified

Vision framework API, VNDetectHumanHandPoseRequest, VNDetectHumanBodyPoseRequest, person segmentation, face detection, VNImageRequestHandler, recognized points, joint landmarks, VNRecognizeTextRequest, VNDetectBarcodesRequest, DataScannerViewController, VNDocumentCameraViewController, RecognizeDocumentsRequest

View on GitHub

Marketplace

axiom-marketplace

CharlesWiltgen/Axiom

Plugin

axiom

Repository

CharlesWiltgen/Axiom
289stars

.claude-plugin/plugins/axiom/skills/axiom-vision-ref/SKILL.md

Last Verified

January 16, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/CharlesWiltgen/Axiom/blob/main/.claude-plugin/plugins/axiom/skills/axiom-vision-ref/SKILL.md -a claude-code --skill axiom-vision-ref

Installation paths:

Claude
.claude/skills/axiom-vision-ref/
Powered by add-skill CLI

Instructions

# Vision Framework API Reference

Comprehensive reference for Vision framework computer vision: subject segmentation, hand/body pose detection, person detection, face analysis, text recognition (OCR), barcode detection, and document scanning.

## When to Use This Reference

- **Implementing subject lifting** using VisionKit or Vision
- **Detecting hand/body poses** for gesture recognition or fitness apps
- **Segmenting people** from backgrounds or separating multiple individuals
- **Face detection and landmarks** for AR effects or authentication
- **Combining Vision APIs** to solve complex computer vision problems
- **Looking up specific API signatures** and parameter meanings
- **Recognizing text** in images (OCR) with VNRecognizeTextRequest
- **Detecting barcodes** and QR codes with VNDetectBarcodesRequest
- **Building live scanners** with DataScannerViewController
- **Scanning documents** with VNDocumentCameraViewController
- **Extracting structured document data** with RecognizeDocumentsRequest (iOS 26+)

**Related skills**: See `axiom-vision` for decision trees and patterns, `axiom-vision-diag` for troubleshooting

## Vision Framework Overview

Vision provides computer vision algorithms for still images and video:

**Core workflow**:
1. Create request (e.g., `VNDetectHumanHandPoseRequest()`)
2. Create handler with image (`VNImageRequestHandler(cgImage: image)`)
3. Perform request (`try handler.perform([request])`)
4. Access observations from `request.results`

**Coordinate system**: Lower-left origin, normalized (0.0-1.0) coordinates

**Performance**: Run on background queue - resource intensive, blocks UI if on main thread

## Subject Segmentation APIs

### VNGenerateForegroundInstanceMaskRequest

**Availability**: iOS 17+, macOS 14+, tvOS 17+, axiom-visionOS 1+

Generates class-agnostic instance mask of foreground objects (people, pets, buildings, food, shoes, etc.)

#### Basic Usage

```swift
let request = VNGenerateForegroundInstanceMaskRequest()
let handle

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
28215 chars