Skip to main content
Cerebras runs models on custom wafer-scale chips, delivering extremely fast inference. If raw speed is your priority, Cerebras is hard to beat.

Setup

Get your API key from Cerebras Cloud.
export CEREBRAS_API_KEY=...

Config

{
  "providers": {
    "cerebras": "${CEREBRAS_API_KEY}"
  }
}

Use it

{
  "agents": [
    { "name": "fast-worker", "model": "cerebras:gpt-oss-cerebras" }
  ]
}

Models

ModelBest for
gpt-oss-cerebrasFast general-purpose inference
Llama variantsOpen model inference at speed

Features

FeatureSupported
StreamingYes
Tool useYes
Vision (images)No

Provider Details

Provider IDcerebras
Env variableCEREBRAS_API_KEY
API typeOpenAI-compatible
Auto-infer prefixgpt-oss-

Notes

  • Cerebras is one of the fastest inference providers available. Time-to-first-token is often under 100ms.
  • Good for the same “fast-worker” pattern as Groq — assign high-volume, simpler tasks to Cerebras agents.