Fallback & Failover

Polpo automatically switches to backup models when a provider fails. This covers rate limits, auth errors, billing issues, and server outages — your agents keep running even when a provider goes down.

Model Fallback Chains

Instead of specifying a single model string, you can use a ModelConfig object with a primary model and ordered fallbacks:

{
  "settings": {
    "orchestratorModel": {
      "primary": "anthropic:claude-sonnet-4-6",
      "fallbacks": [
        "openai:gpt-4o",
        "google:gemini-2.5-pro",
        "groq:llama-3.3-70b-versatile"
      ]
    }
  }
}

How Resolution Works

Primary model is tried first
If the primary’s provider has no API key or is in cooldown, try the first fallback
Continue through the fallback list in order
If all providers fail or are in cooldown, throw the last error

interface ModelConfig {
  /** Primary model spec (e.g. "anthropic:claude-opus-4-6"). */
  primary?: string;
  /** Ordered fallback models — tried when primary fails. */
  fallbacks?: string[];
}

Put your preferred model first, a similar-capability alternative second, and a free/always-available model last as an emergency fallback.

Fallback Triggers

Failover happens when:

The provider’s API key is missing (checked before making the call)
The provider is currently in cooldown (from a recent error)
The API call returns a retriable error (rate limit, auth, billing, server error)

Failover does not happen for:

Client errors (400 Bad Request, invalid model ID) — these indicate a config problem, not a provider issue
Network transient errors (timeout, connection reset) — these are retried on the same provider first

Provider Cooldown

When a provider returns certain errors, Polpo puts it into cooldown — a temporary period where that provider is skipped in favor of alternatives.

Cooldown Backoff

Cooldown uses exponential backoff to avoid hammering a failing provider:

Consecutive Failures	Cooldown Duration
1st failure	1 minute
2nd failure	5 minutes
3rd failure	25 minutes
4th+ failure	1 hour (cap)

A successful call to a provider resets its cooldown counter to zero.

Cooldown Triggers

Error Type	Triggers Cooldown?	Triggers Failover?
Auth error (401, 403)	Yes	Yes
Rate limit (429)	Yes	Yes
Billing (402, insufficient credits)	Yes	Yes
Server error (500, 502, 503, 504)	Yes	Yes
Network transient (timeout, ECONNRESET)	No	No (retried)
Client error (400, 404, invalid model)	No	No (thrown)

Monitoring Cooldowns

You can inspect which providers are currently in cooldown:

import { getProviderCooldowns } from "polpo/llm";

const cooldowns = getProviderCooldowns();
// → {
//   "anthropic": { until: 1708012800000, errorCount: 2, reason: "rate_limit" },
//   "openai": { until: 1708012500000, errorCount: 1, reason: "server_error" }
// }

Error Classification

Polpo classifies every provider error to determine the correct response:

import { classifyProviderError } from "polpo/llm";

try {
  await queryText("hello", "anthropic:claude-sonnet-4-6");
} catch (err) {
  const classified = classifyProviderError(err);
  // → { shouldCooldown: true, shouldFailover: true, reason: "rate_limit" }
}

Classification Rules

Pattern	Reason	Action
`401`, `unauthorized`, `invalid api key`, `forbidden`, `403`	`auth`	Cooldown + failover
`429`, `rate limit`, `too many requests`, `quota exceeded`	`rate_limit`	Cooldown + failover
`402`, `insufficient`, `credit`, `billing`, `payment required`	`billing`	Cooldown + failover
`500`, `502`, `503`, `504`, `overloaded`, `service unavailable`	`server_error`	Cooldown + failover
`timeout`, `econnreset`, `econnrefused`, `socket hang up`	`network`	Retry (no cooldown)
`400`, `invalid`, `not found`, `404`	`client_error`	Throw (no retry)

Query with Fallback

The queryTextWithFallback function combines fallback chains with cooldown management:

import { queryTextWithFallback } from "polpo/llm";

const result = await queryTextWithFallback("Explain this code...", {
  primary: "anthropic:claude-sonnet-4-6",
  fallbacks: ["openai:gpt-4o", "google:gemini-2.5-pro"]
});

console.log(result.text);       // The response
console.log(result.usedSpec);   // Which model actually answered, e.g. "openai:gpt-4o"
console.log(result.usage);      // Token usage
console.log(result.model);      // Full model metadata

Flow

queryTextWithFallback("prompt", { primary: "anthropic:...", fallbacks: ["openai:...", "google:..."] })
  │
  ├─ Try anthropic
  │   ├─ In cooldown? → skip
  │   ├─ Success? → return result, clear cooldown
  │   └─ Error? → classify
  │       ├─ shouldCooldown → markProviderCooldown("anthropic")
  │       ├─ shouldFailover → try next
  │       └─ else → throw immediately
  │
  ├─ Try openai
  │   └─ (same logic)
  │
  ├─ Try google
  │   └─ (same logic)
  │
  └─ All failed → throw last error

Billing Disable

Billing errors (insufficient credits, quota exceeded, 402 Payment Required) trigger a separate disable mechanism with much longer backoff than regular cooldown. This is because billing issues are persistent — retrying in 1 minute won’t help if your credits are depleted.

Billing Backoff

Consecutive Billing Failures	Disable Duration
1st failure	5 hours
2nd failure	10 hours
3rd failure	20 hours
4th+ failure	24 hours (cap)

The billing error counter resets after 24 hours without a billing failure.

vs. Regular Cooldown

	Regular Cooldown	Billing Disable
Triggers	Rate limit, auth error, server error	Insufficient credits, quota exceeded
Backoff curve	1min -> 5min -> 25min -> 1h	5h -> 10h -> 20h -> 24h
Assumption	Transient — will resolve quickly	Persistent — needs payment or plan upgrade
Level	Provider-level	Auth profile-level

Both mechanisms are tracked per auth profile, not per provider. If one API key is out of credits but another has quota, Polpo automatically rotates to the working one.

Billing disable timing can be configured in polpo.json:

{
  "settings": {
    "billingDisable": {
      "billingBackoffHours": 5,
      "billingMaxHours": 24,
      "failureWindowHours": 24
    }
  }
}

Auth Profile Rotation

When you have multiple auth profiles for the same provider (e.g., personal + work OAuth, OAuth + API key), Polpo selects the best profile for each request using a rotation algorithm:

OAuth profiles before API key profiles — OAuth supports auto-refresh
Oldest lastUsed first — round-robin for even distribution
Skip profiles in cooldown — temporary errors
Skip billing-disabled profiles — insufficient credits

This ensures:

Even usage distribution across credentials
Automatic failover when one credential hits rate limits
No wasted calls on billing-disabled profiles

See the Authentication guide for full details on profiles, rotation, and session stickiness.

Escalation Model Override

Orthogonal to fallback chains, Polpo supports escalation model override — when a task fails enough times, the retry uses a more capable (usually more expensive) model:

{
  "retryPolicy": {
    "escalateAfter": 2,
    "fallbackAgent": "senior-dev",
    "escalateModel": "anthropic:claude-opus-4-6"
  }
}

This is different from provider failover:

Fallback chains handle provider-level failures (rate limits, outages)
Escalation handles task-level failures (agent produced wrong output)

Both can work together — an escalated model can itself have a fallback chain configured.

Retry Integration

Fallback works on top of Polpo’s existing retry system. The full error handling pipeline is:

Retry — withRetry retries transient network errors (up to 2 times)
Failover — queryTextWithFallback switches providers on classified errors
Cooldown — Failed providers are temporarily skipped
Escalation — After repeated task failures, Polpo switches to a more capable model

When using querySDKWithFallback, the retry count is reduced to 1 (instead of 2) since the fallback mechanism already provides resilience against provider failures.

LLM Models

Providers

Fallback & Failover

Model Fallback Chains

How Resolution Works

Fallback Triggers

Provider Cooldown

Cooldown Backoff

Cooldown Triggers

Monitoring Cooldowns

Error Classification

Classification Rules

Query with Fallback

Flow

Billing Disable

Billing Backoff

vs. Regular Cooldown

Auth Profile Rotation

Escalation Model Override

Retry Integration

LLM Models

Providers

​Model Fallback Chains

​How Resolution Works

​Fallback Triggers

​Provider Cooldown

​Cooldown Backoff

​Cooldown Triggers

​Monitoring Cooldowns

​Error Classification

​Classification Rules

​Query with Fallback

​Flow

​Billing Disable

​Billing Backoff

​vs. Regular Cooldown

​Auth Profile Rotation

​Escalation Model Override

​Retry Integration

Model Fallback Chains

How Resolution Works

Fallback Triggers

Provider Cooldown

Cooldown Backoff

Cooldown Triggers

Monitoring Cooldowns

Error Classification

Classification Rules

Query with Fallback

Flow

Billing Disable

Billing Backoff

vs. Regular Cooldown

Auth Profile Rotation

Escalation Model Override

Retry Integration