subject segmentation, VNGenerateForegroundInstanceMaskRequest, isolate object from hand, VisionKit subject lifting, image foreground detection, instance masks, class-agnostic segmentation, VNRecognizeTextRequest, OCR, VNDetectBarcodesRequest, DataScannerViewController, document scanning, RecognizeDocumentsRequest
View on GitHubSelect agents to install to:
npx add-skill https://github.com/CharlesWiltgen/Axiom/blob/main/.claude-plugin/plugins/axiom/skills/axiom-vision/SKILL.md -a claude-code --skill axiom-visionInstallation paths:
.claude/skills/axiom-vision/# Vision Framework Computer Vision Guides you through implementing computer vision: subject segmentation, hand/body pose detection, person detection, text recognition, barcode detection, document scanning, and combining Vision APIs to solve complex problems. ## When to Use This Skill Use when you need to: - ☑ Isolate subjects from backgrounds (subject lifting) - ☑ Detect and track hand poses for gestures - ☑ Detect and track body poses for fitness/action classification - ☑ Segment multiple people separately - ☑ Exclude hands from object bounding boxes (combining APIs) - ☑ Choose between VisionKit and Vision framework - ☑ Combine Vision with CoreImage for compositing - ☑ Decide which Vision API solves your problem - ☑ Recognize text in images (OCR) - ☑ Detect barcodes and QR codes - ☑ Scan documents with perspective correction - ☑ Extract structured data from documents (iOS 26+) - ☑ Build live scanning experiences (DataScannerViewController) ## Example Prompts "How do I isolate a subject from the background?" "I need to detect hand gestures like pinch" "How can I get a bounding box around an object **without including the hand holding it**?" "Should I use VisionKit or Vision framework for subject lifting?" "How do I segment multiple people separately?" "I need to detect body poses for a fitness app" "How do I preserve HDR when compositing subjects on new backgrounds?" "How do I recognize text in an image?" "I need to scan QR codes from camera" "How do I extract data from a receipt?" "Should I use DataScannerViewController or Vision directly?" "How do I scan documents and correct perspective?" "I need to extract table data from a document" ## Red Flags Signs you're making this harder than it needs to be: - ❌ Manually implementing subject segmentation with CoreML models - ❌ Using ARKit just for body pose (Vision works offline) - ❌ Writing gesture recognition from scratch (use hand pose + simple distance checks) - ❌ Processing on main thread (blocks UI - Vision i