Polpo automatically switches to backup models when a provider fails. This covers rate limits, auth errors, billing issues, and server outages — your agents keep running even when a provider goes down.
Model Fallback Chains
Instead of specifying a single model string, you can use a ModelConfig object with a primary model and ordered fallbacks:
{
"settings": {
"orchestratorModel": {
"primary": "anthropic:claude-sonnet-4-6",
"fallbacks": [
"openai:gpt-4o",
"google:gemini-2.5-pro",
"groq:llama-3.3-70b-versatile"
]
}
}
}
How Resolution Works
- Primary model is tried first
- If the primary’s provider has no API key or is in cooldown, try the first fallback
- Continue through the fallback list in order
- If all providers fail or are in cooldown, throw the last error
interface ModelConfig {
/** Primary model spec (e.g. "anthropic:claude-opus-4-6"). */
primary?: string;
/** Ordered fallback models — tried when primary fails. */
fallbacks?: string[];
}
Put your preferred model first, a similar-capability alternative second, and a free/always-available model last as an emergency fallback.
Fallback Triggers
Failover happens when:
- The provider’s API key is missing (checked before making the call)
- The provider is currently in cooldown (from a recent error)
- The API call returns a retriable error (rate limit, auth, billing, server error)
Failover does not happen for:
- Client errors (400 Bad Request, invalid model ID) — these indicate a config problem, not a provider issue
- Network transient errors (timeout, connection reset) — these are retried on the same provider first
Provider Cooldown
When a provider returns certain errors, Polpo puts it into cooldown — a temporary period where that provider is skipped in favor of alternatives.
Cooldown Backoff
Cooldown uses exponential backoff to avoid hammering a failing provider:
| Consecutive Failures | Cooldown Duration |
|---|
| 1st failure | 1 minute |
| 2nd failure | 5 minutes |
| 3rd failure | 25 minutes |
| 4th+ failure | 1 hour (cap) |
A successful call to a provider resets its cooldown counter to zero.
Cooldown Triggers
| Error Type | Triggers Cooldown? | Triggers Failover? |
|---|
| Auth error (401, 403) | Yes | Yes |
| Rate limit (429) | Yes | Yes |
| Billing (402, insufficient credits) | Yes | Yes |
| Server error (500, 502, 503, 504) | Yes | Yes |
| Network transient (timeout, ECONNRESET) | No | No (retried) |
| Client error (400, 404, invalid model) | No | No (thrown) |
Monitoring Cooldowns
You can inspect which providers are currently in cooldown:
import { getProviderCooldowns } from "polpo/llm";
const cooldowns = getProviderCooldowns();
// → {
// "anthropic": { until: 1708012800000, errorCount: 2, reason: "rate_limit" },
// "openai": { until: 1708012500000, errorCount: 1, reason: "server_error" }
// }
Error Classification
Polpo classifies every provider error to determine the correct response:
import { classifyProviderError } from "polpo/llm";
try {
await queryText("hello", "anthropic:claude-sonnet-4-6");
} catch (err) {
const classified = classifyProviderError(err);
// → { shouldCooldown: true, shouldFailover: true, reason: "rate_limit" }
}
Classification Rules
| Pattern | Reason | Action |
|---|
401, unauthorized, invalid api key, forbidden, 403 | auth | Cooldown + failover |
429, rate limit, too many requests, quota exceeded | rate_limit | Cooldown + failover |
402, insufficient, credit, billing, payment required | billing | Cooldown + failover |
500, 502, 503, 504, overloaded, service unavailable | server_error | Cooldown + failover |
timeout, econnreset, econnrefused, socket hang up | network | Retry (no cooldown) |
400, invalid, not found, 404 | client_error | Throw (no retry) |
Query with Fallback
The queryTextWithFallback function combines fallback chains with cooldown management:
import { queryTextWithFallback } from "polpo/llm";
const result = await queryTextWithFallback("Explain this code...", {
primary: "anthropic:claude-sonnet-4-6",
fallbacks: ["openai:gpt-4o", "google:gemini-2.5-pro"]
});
console.log(result.text); // The response
console.log(result.usedSpec); // Which model actually answered, e.g. "openai:gpt-4o"
console.log(result.usage); // Token usage
console.log(result.model); // Full model metadata
Flow
queryTextWithFallback("prompt", { primary: "anthropic:...", fallbacks: ["openai:...", "google:..."] })
│
├─ Try anthropic
│ ├─ In cooldown? → skip
│ ├─ Success? → return result, clear cooldown
│ └─ Error? → classify
│ ├─ shouldCooldown → markProviderCooldown("anthropic")
│ ├─ shouldFailover → try next
│ └─ else → throw immediately
│
├─ Try openai
│ └─ (same logic)
│
├─ Try google
│ └─ (same logic)
│
└─ All failed → throw last error
Billing Disable
Billing errors (insufficient credits, quota exceeded, 402 Payment Required) trigger a separate disable mechanism with much longer backoff than regular cooldown. This is because billing issues are persistent — retrying in 1 minute won’t help if your credits are depleted.
Billing Backoff
| Consecutive Billing Failures | Disable Duration |
|---|
| 1st failure | 5 hours |
| 2nd failure | 10 hours |
| 3rd failure | 20 hours |
| 4th+ failure | 24 hours (cap) |
The billing error counter resets after 24 hours without a billing failure.
vs. Regular Cooldown
| Regular Cooldown | Billing Disable |
|---|
| Triggers | Rate limit, auth error, server error | Insufficient credits, quota exceeded |
| Backoff curve | 1min -> 5min -> 25min -> 1h | 5h -> 10h -> 20h -> 24h |
| Assumption | Transient — will resolve quickly | Persistent — needs payment or plan upgrade |
| Level | Provider-level | Auth profile-level |
Both mechanisms are tracked per auth profile, not per provider. If one API key is out of credits but another has quota, Polpo automatically rotates to the working one.
Billing disable timing can be configured in polpo.json:{
"settings": {
"billingDisable": {
"billingBackoffHours": 5,
"billingMaxHours": 24,
"failureWindowHours": 24
}
}
}
Auth Profile Rotation
When you have multiple auth profiles for the same provider (e.g., personal + work OAuth, OAuth + API key), Polpo selects the best profile for each request using a rotation algorithm:
- OAuth profiles before API key profiles — OAuth supports auto-refresh
- Oldest
lastUsed first — round-robin for even distribution
- Skip profiles in cooldown — temporary errors
- Skip billing-disabled profiles — insufficient credits
This ensures:
- Even usage distribution across credentials
- Automatic failover when one credential hits rate limits
- No wasted calls on billing-disabled profiles
See the Authentication guide for full details on profiles, rotation, and session stickiness.
Escalation Model Override
Orthogonal to fallback chains, Polpo supports escalation model override — when a task fails enough times, the retry uses a more capable (usually more expensive) model:
{
"retryPolicy": {
"escalateAfter": 2,
"fallbackAgent": "senior-dev",
"escalateModel": "anthropic:claude-opus-4-6"
}
}
This is different from provider failover:
- Fallback chains handle provider-level failures (rate limits, outages)
- Escalation handles task-level failures (agent produced wrong output)
Both can work together — an escalated model can itself have a fallback chain configured.
Retry Integration
Fallback works on top of Polpo’s existing retry system. The full error handling pipeline is:
- Retry —
withRetry retries transient network errors (up to 2 times)
- Failover —
queryTextWithFallback switches providers on classified errors
- Cooldown — Failed providers are temporarily skipped
- Escalation — After repeated task failures, Polpo switches to a more capable model
When using querySDKWithFallback, the retry count is reduced to 1 (instead of 2) since the fallback mechanism already provides resilience against provider failures.