Generate images from text prompts and analyze images using vision models.
Maximum image file size for analysis: 20 MB (MAX_IMAGE_SIZE). Default timeout: 120 seconds. Zero vendor SDK dependencies — all provider APIs are called via direct fetch.
Providers:
- Image generation — fal.ai (FLUX models —
fal-ai/flux/dev default)
- Vision analysis — OpenAI GPT-4.1-mini (default), Anthropic Claude
Credential resolution (same order as email tools):
- Agent vault (e.g. service
"fal-ai" with key "key" for fal.ai, service "openai" with key "key" for vision)
- Environment variables:
FAL_KEY (generation), OPENAI_API_KEY or ANTHROPIC_API_KEY (vision)
Enable via allowedTools:
{ "allowedTools": ["image_*"] }
image_generate — Generate Image
Generate an image from a text prompt using fal.ai FLUX models.
Parameters
| Parameter | Type | Required | Description |
|---|
prompt | string | yes | Text description of the image to generate |
path | string | yes | Destination file path (format inferred from extension: png, jpg, webp) |
model | string | no | fal.ai model ID. Default: fal-ai/flux/dev. Options: fal-ai/flux-pro/v1.1 (best quality), fal-ai/flux/schnell (fastest) |
size | string | no | Image dimensions as WIDTHxHEIGHT (default 1024x1024) |
num_inference_steps | number | no | Inference steps — higher = better quality, slower (default varies by model) |
guidance_scale | number | no | CFG scale — how closely to follow the prompt (default 3.5) |
seed | number | no | Random seed for reproducible results |
Returns
Confirmation with the output file path, file size, model, and dimensions.
Notes
- Uses fal.ai async queue API (submit + poll) for reliable generation
- Output format is inferred from the file extension (
.png, .jpg, .webp)
- Requires
FAL_KEY env var or vault service "fal-ai" with credential key "key"
image_analyze — Analyze Image
Analyze an image using a vision-language model.
Parameters
| Parameter | Type | Required | Description |
|---|
path | string | yes | Path to the image file |
prompt | string | no | Analysis prompt (default "Describe this image in detail") |
provider | enum: openai | anthropic | no | Vision provider (default openai) |
model | string | no | Model name. OpenAI: gpt-4.1-mini (default). Anthropic: claude-sonnet-4-20250514 (default) |
max_tokens | number | no | Maximum tokens in the response (default 1024) |
Returns
Text description or analysis of the image based on the prompt.
Notes
- Images are base64-encoded and sent inline to the vision API
- Anthropic supports only: jpeg, png, gif, webp
- OpenAI supports a broader range of image formats
- Use a specific
prompt to guide the analysis (e.g. "Extract all text from this image", "Describe the UI layout")
- Credentials resolved from: vault (service
"openai" or "anthropic") > OPENAI_API_KEY / ANTHROPIC_API_KEY env var