Skip to main content
Groq runs open-source models on custom LPU (Language Processing Unit) hardware, delivering some of the fastest inference speeds available. Ideal for tasks where latency matters more than peak model quality.

Setup

Get your API key from GroqCloud.
export GROQ_API_KEY=gsk_...

Config

{
  "providers": {
    "groq": "${GROQ_API_KEY}"
  }
}

Use it

{
  "agents": [
    { "name": "fast-worker", "model": "groq:llama-3.3-70b-versatile" }
  ]
}
Polpo auto-infers groq from the llama- prefix.

Models

ModelBest forContextReasoning
llama-3.3-70b-versatileBest open model on Groq, strong general quality128KNo
llama-3.1-8b-instantSimple tasks, formatting, boilerplate128KNo
deepseek-r1-distill-llama-70bOpen-source reasoning128KYes
qwen/qwen3-32bStrong open reasoning model128KYes
openai/gpt-oss-120bLarge open model, high qualityYes
openai/gpt-oss-20bSmaller open model, fastYes
Also available: meta-llama/llama-4-scout-17b-16e-instruct, mistral-saba-24b, gemma2-9b-it

Features

FeatureSupported
StreamingYes
Tool useYes
Vision (images)Yes (Llama 4 Scout)
ReasoningYes (DeepSeek R1, Qwen 3, GPT-OSS)

Pricing

Groq pricing is competitive and changes frequently. Check groq.com/pricing for current rates. Generally, Groq is 5-10x cheaper than equivalent models on other providers due to their custom hardware.

Provider Details

Provider IDgroq
Env variableGROQ_API_KEY
API typeOpenAI-compatible
Auto-infer prefixllama-

Use Case: Fast Worker Pattern

Groq is excellent for the “fast-worker” agent pattern. Assign it tasks where speed matters more than peak quality:
{
  "agents": [
    {
      "name": "fast-worker",
      "model": "groq:llama-3.3-70b-versatile",
      "role": "Quick tasks, boilerplate, formatting, simple refactoring"
    },
    {
      "name": "senior-dev",
      "model": "anthropic:claude-sonnet-4-6",
      "role": "Complex implementation and architecture"
    }
  ]
}
The fast worker handles the simple tasks while the more capable model handles the complex ones. This reduces both cost and overall execution time.

Notes

  • Groq has rate limits that vary by model and account tier. For high-volume workloads, monitor your usage.
  • The Llama 3.3 70B on Groq often completes in under 2 seconds — fast enough that agent coordination overhead becomes the bottleneck, not inference.
  • Groq now hosts reasoning models (DeepSeek R1, Qwen 3, GPT-OSS) — strong open alternatives at high speed.
  • Llama 4 Scout (MoE) supports vision inputs.