Skip to main content
Polpo automatically switches to backup models when a provider fails. This covers rate limits, auth errors, billing issues, and server outages — your agents keep running even when a provider goes down.

Model Fallback Chains

Instead of specifying a single model string, you can use a ModelConfig object with a primary model and ordered fallbacks:
{
  "settings": {
    "orchestratorModel": {
      "primary": "anthropic:claude-sonnet-4-6",
      "fallbacks": [
        "openai:gpt-4o",
        "google:gemini-2.5-pro",
        "groq:llama-3.3-70b-versatile"
      ]
    }
  }
}

How Resolution Works

  1. Primary model is tried first
  2. If the primary’s provider has no API key or is in cooldown, try the first fallback
  3. Continue through the fallback list in order
  4. If all providers fail or are in cooldown, throw the last error
interface ModelConfig {
  /** Primary model spec (e.g. "anthropic:claude-opus-4-6"). */
  primary?: string;
  /** Ordered fallback models — tried when primary fails. */
  fallbacks?: string[];
}
Put your preferred model first, a similar-capability alternative second, and a free/always-available model last as an emergency fallback.

Fallback Triggers

Failover happens when:
  • The provider’s API key is missing (checked before making the call)
  • The provider is currently in cooldown (from a recent error)
  • The API call returns a retriable error (rate limit, auth, billing, server error)
Failover does not happen for:
  • Client errors (400 Bad Request, invalid model ID) — these indicate a config problem, not a provider issue
  • Network transient errors (timeout, connection reset) — these are retried on the same provider first

Provider Cooldown

When a provider returns certain errors, Polpo puts it into cooldown — a temporary period where that provider is skipped in favor of alternatives.

Cooldown Backoff

Cooldown uses exponential backoff to avoid hammering a failing provider:
Consecutive FailuresCooldown Duration
1st failure1 minute
2nd failure5 minutes
3rd failure25 minutes
4th+ failure1 hour (cap)
A successful call to a provider resets its cooldown counter to zero.

Cooldown Triggers

Error TypeTriggers Cooldown?Triggers Failover?
Auth error (401, 403)YesYes
Rate limit (429)YesYes
Billing (402, insufficient credits)YesYes
Server error (500, 502, 503, 504)YesYes
Network transient (timeout, ECONNRESET)NoNo (retried)
Client error (400, 404, invalid model)NoNo (thrown)

Monitoring Cooldowns

You can inspect which providers are currently in cooldown:
import { getProviderCooldowns } from "polpo/llm";

const cooldowns = getProviderCooldowns();
// → {
//   "anthropic": { until: 1708012800000, errorCount: 2, reason: "rate_limit" },
//   "openai": { until: 1708012500000, errorCount: 1, reason: "server_error" }
// }

Error Classification

Polpo classifies every provider error to determine the correct response:
import { classifyProviderError } from "polpo/llm";

try {
  await queryText("hello", "anthropic:claude-sonnet-4-6");
} catch (err) {
  const classified = classifyProviderError(err);
  // → { shouldCooldown: true, shouldFailover: true, reason: "rate_limit" }
}

Classification Rules

PatternReasonAction
401, unauthorized, invalid api key, forbidden, 403authCooldown + failover
429, rate limit, too many requests, quota exceededrate_limitCooldown + failover
402, insufficient, credit, billing, payment requiredbillingCooldown + failover
500, 502, 503, 504, overloaded, service unavailableserver_errorCooldown + failover
timeout, econnreset, econnrefused, socket hang upnetworkRetry (no cooldown)
400, invalid, not found, 404client_errorThrow (no retry)

Query with Fallback

The queryTextWithFallback function combines fallback chains with cooldown management:
import { queryTextWithFallback } from "polpo/llm";

const result = await queryTextWithFallback("Explain this code...", {
  primary: "anthropic:claude-sonnet-4-6",
  fallbacks: ["openai:gpt-4o", "google:gemini-2.5-pro"]
});

console.log(result.text);       // The response
console.log(result.usedSpec);   // Which model actually answered, e.g. "openai:gpt-4o"
console.log(result.usage);      // Token usage
console.log(result.model);      // Full model metadata

Flow

queryTextWithFallback("prompt", { primary: "anthropic:...", fallbacks: ["openai:...", "google:..."] })

  ├─ Try anthropic
  │   ├─ In cooldown? → skip
  │   ├─ Success? → return result, clear cooldown
  │   └─ Error? → classify
  │       ├─ shouldCooldown → markProviderCooldown("anthropic")
  │       ├─ shouldFailover → try next
  │       └─ else → throw immediately

  ├─ Try openai
  │   └─ (same logic)

  ├─ Try google
  │   └─ (same logic)

  └─ All failed → throw last error

Billing Disable

Billing errors (insufficient credits, quota exceeded, 402 Payment Required) trigger a separate disable mechanism with much longer backoff than regular cooldown. This is because billing issues are persistent — retrying in 1 minute won’t help if your credits are depleted.

Billing Backoff

Consecutive Billing FailuresDisable Duration
1st failure5 hours
2nd failure10 hours
3rd failure20 hours
4th+ failure24 hours (cap)
The billing error counter resets after 24 hours without a billing failure.

vs. Regular Cooldown

Regular CooldownBilling Disable
TriggersRate limit, auth error, server errorInsufficient credits, quota exceeded
Backoff curve1min -> 5min -> 25min -> 1h5h -> 10h -> 20h -> 24h
AssumptionTransient — will resolve quicklyPersistent — needs payment or plan upgrade
LevelProvider-levelAuth profile-level
Both mechanisms are tracked per auth profile, not per provider. If one API key is out of credits but another has quota, Polpo automatically rotates to the working one.
Billing disable timing can be configured in polpo.json:
{
  "settings": {
    "billingDisable": {
      "billingBackoffHours": 5,
      "billingMaxHours": 24,
      "failureWindowHours": 24
    }
  }
}

Auth Profile Rotation

When you have multiple auth profiles for the same provider (e.g., personal + work OAuth, OAuth + API key), Polpo selects the best profile for each request using a rotation algorithm:
  1. OAuth profiles before API key profiles — OAuth supports auto-refresh
  2. Oldest lastUsed first — round-robin for even distribution
  3. Skip profiles in cooldown — temporary errors
  4. Skip billing-disabled profiles — insufficient credits
This ensures:
  • Even usage distribution across credentials
  • Automatic failover when one credential hits rate limits
  • No wasted calls on billing-disabled profiles
See the Authentication guide for full details on profiles, rotation, and session stickiness.

Escalation Model Override

Orthogonal to fallback chains, Polpo supports escalation model override — when a task fails enough times, the retry uses a more capable (usually more expensive) model:
{
  "retryPolicy": {
    "escalateAfter": 2,
    "fallbackAgent": "senior-dev",
    "escalateModel": "anthropic:claude-opus-4-6"
  }
}
This is different from provider failover:
  • Fallback chains handle provider-level failures (rate limits, outages)
  • Escalation handles task-level failures (agent produced wrong output)
Both can work together — an escalated model can itself have a fallback chain configured.

Retry Integration

Fallback works on top of Polpo’s existing retry system. The full error handling pipeline is:
  1. RetrywithRetry retries transient network errors (up to 2 times)
  2. FailoverqueryTextWithFallback switches providers on classified errors
  3. Cooldown — Failed providers are temporarily skipped
  4. Escalation — After repeated task failures, Polpo switches to a more capable model
When using querySDKWithFallback, the retry count is reduced to 1 (instead of 2) since the fallback mechanism already provides resilience against provider failures.