Skip to main content
Generate images from text prompts and analyze images using vision models.
Maximum image file size for analysis: 20 MB (MAX_IMAGE_SIZE). Default timeout: 120 seconds. Zero vendor SDK dependencies — all provider APIs are called via direct fetch.
Providers:
  • Image generation — fal.ai (FLUX models — fal-ai/flux/dev default)
  • Vision analysis — OpenAI GPT-4.1-mini (default), Anthropic Claude
Credential resolution (same order as email tools):
  1. Agent vault (e.g. service "fal-ai" with key "key" for fal.ai, service "openai" with key "key" for vision)
  2. Environment variables: FAL_KEY (generation), OPENAI_API_KEY or ANTHROPIC_API_KEY (vision)
Enable via allowedTools:
{ "allowedTools": ["image_*"] }

image_generate — Generate Image

Generate an image from a text prompt using fal.ai FLUX models.

Parameters

ParameterTypeRequiredDescription
promptstringyesText description of the image to generate
pathstringyesDestination file path (format inferred from extension: png, jpg, webp)
modelstringnofal.ai model ID. Default: fal-ai/flux/dev. Options: fal-ai/flux-pro/v1.1 (best quality), fal-ai/flux/schnell (fastest)
sizestringnoImage dimensions as WIDTHxHEIGHT (default 1024x1024)
num_inference_stepsnumbernoInference steps — higher = better quality, slower (default varies by model)
guidance_scalenumbernoCFG scale — how closely to follow the prompt (default 3.5)
seednumbernoRandom seed for reproducible results

Returns

Confirmation with the output file path, file size, model, and dimensions.

Notes

  • Uses fal.ai async queue API (submit + poll) for reliable generation
  • Output format is inferred from the file extension (.png, .jpg, .webp)
  • Requires FAL_KEY env var or vault service "fal-ai" with credential key "key"

image_analyze — Analyze Image

Analyze an image using a vision-language model.

Parameters

ParameterTypeRequiredDescription
pathstringyesPath to the image file
promptstringnoAnalysis prompt (default "Describe this image in detail")
providerenum: openai | anthropicnoVision provider (default openai)
modelstringnoModel name. OpenAI: gpt-4.1-mini (default). Anthropic: claude-sonnet-4-20250514 (default)
max_tokensnumbernoMaximum tokens in the response (default 1024)

Returns

Text description or analysis of the image based on the prompt.

Notes

  • Images are base64-encoded and sent inline to the vision API
  • Anthropic supports only: jpeg, png, gif, webp
  • OpenAI supports a broader range of image formats
  • Use a specific prompt to guide the analysis (e.g. "Extract all text from this image", "Describe the UI layout")
  • Credentials resolved from: vault (service "openai" or "anthropic") > OPENAI_API_KEY / ANTHROPIC_API_KEY env var