Setup
Get your API key from GroqCloud.Config
Use it
groq from the llama- prefix.
Models
| Model | Best for | Context | Reasoning |
|---|---|---|---|
llama-3.3-70b-versatile | Best open model on Groq, strong general quality | 128K | No |
llama-3.1-8b-instant | Simple tasks, formatting, boilerplate | 128K | No |
deepseek-r1-distill-llama-70b | Open-source reasoning | 128K | Yes |
qwen/qwen3-32b | Strong open reasoning model | 128K | Yes |
openai/gpt-oss-120b | Large open model, high quality | — | Yes |
openai/gpt-oss-20b | Smaller open model, fast | — | Yes |
meta-llama/llama-4-scout-17b-16e-instruct, mistral-saba-24b, gemma2-9b-it
Features
| Feature | Supported |
|---|---|
| Streaming | Yes |
| Tool use | Yes |
| Vision (images) | Yes (Llama 4 Scout) |
| Reasoning | Yes (DeepSeek R1, Qwen 3, GPT-OSS) |
Pricing
Groq pricing is competitive and changes frequently. Check groq.com/pricing for current rates. Generally, Groq is 5-10x cheaper than equivalent models on other providers due to their custom hardware.Provider Details
| Provider ID | groq |
| Env variable | GROQ_API_KEY |
| API type | OpenAI-compatible |
| Auto-infer prefix | llama- |
Use Case: Fast Worker Pattern
Groq is excellent for the “fast-worker” agent pattern. Assign it tasks where speed matters more than peak quality:Notes
- Groq has rate limits that vary by model and account tier. For high-volume workloads, monitor your usage.
- The Llama 3.3 70B on Groq often completes in under 2 seconds — fast enough that agent coordination overhead becomes the bottleneck, not inference.
- Groq now hosts reasoning models (DeepSeek R1, Qwen 3, GPT-OSS) — strong open alternatives at high speed.
- Llama 4 Scout (MoE) supports vision inputs.