Back to Skills

gem

verified

Multimodal AI processing using Google Gemini. Use for analyzing PDFs, images, videos, YouTube links, and other large documents. Ideal when you need to extract information from files that require vision or multimodal understanding.

View on GitHub

Marketplace

hamel

hamelsmu/hamel

Plugin

hamel-tools

Repository

hamelsmu/hamel
45stars

plugins/hamel-tools/skills/gem/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/hamelsmu/hamel/blob/main/plugins/hamel-tools/skills/gem/SKILL.md -a claude-code --skill gem

Installation paths:

Claude
.claude/skills/gem/
Powered by add-skill CLI

Instructions

# Gemini Multimodal Tool

Use the `ai-gem` CLI tool for multimodal AI processing via Google's Gemini API.

## Usage

```bash
# Text queries
ai-gem "Write a haiku about Python programming"

# Analyze documents
ai-gem "Summarize this document" document.pdf

# Analyze images
ai-gem "What's in this image?" photo.jpg

# Process YouTube videos
ai-gem "Create a 5-point summary" "https://youtu.be/VIDEO_ID"

# Compare multiple files
ai-gem "Compare these files" file1.pdf file2.png

# Web search
ai-gem "Current AI news" --search
```

## Requirements

- `GEMINI_API_KEY` environment variable must be set
- The `hamel` package must be installed: `pip install hamel`

## Supported Input Types

- PDFs
- Images (PNG, JPEG, GIF, WebP)
- Videos (MP4, etc.)
- YouTube URLs
- Plain text files
- Multiple files for comparison

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
800 chars