Cost Tracking

Polpo tracks per-call cost for every LLM interaction using pricing data from the @mariozechner/pi-ai model catalog. Every model in the catalog includes up-to-date pricing for input tokens, output tokens, cache reads, and cache writes.

How Cost Tracking Works

Every query through Polpo’s LLM layer returns usage data alongside the response:

import { querySDKTextDetailed } from "polpo/llm";

const result = await querySDKTextDetailed(
  "Refactor this function for readability",
  "/path/to/workdir",
  "anthropic:claude-sonnet-4-6"
);

console.log(result.text);     // The response
console.log(result.usage);    // { inputTokens: 1250, outputTokens: 450, ... }
console.log(result.costUsd);  // 0.00875 (in USD)

Usage Object

The usage field returned by pi-ai contains:

Field	Description
`inputTokens`	Number of input tokens processed
`outputTokens`	Number of output tokens generated
`cacheReadTokens`	Tokens served from provider cache
`cacheWriteTokens`	Tokens written to provider cache

Cost Calculation

Cost is calculated using the model’s pricing metadata:

import { estimateCost, resolveModel } from "polpo/llm";

const model = resolveModel("anthropic:claude-sonnet-4-6");
const cost = estimateCost(model, {
  inputTokens: 10000,
  outputTokens: 2000,
  cacheReadTokens: 5000,
  cacheWriteTokens: 0,
});

console.log(cost);
// → {
//   inputCost: 0.03,
//   outputCost: 0.03,
//   cacheReadCost: 0.0015,
//   cacheWriteCost: 0,
//   totalCost: 0.0615,
//   currency: "USD"
// }

Detailed vs Simple Queries

Polpo provides two query patterns:

Simple (text only)

import { querySDKText } from "polpo/llm";

// Returns just the text string — no usage or cost data
const text = await querySDKText(prompt, cwd, model);

Detailed (with metadata)

import { querySDKTextDetailed, querySDKStreamDetailed } from "polpo/llm";

// Returns text + usage + cost + model metadata
const result = await querySDKTextDetailed(prompt, cwd, model);
result.text;     // string
result.usage;    // Usage | undefined
result.costUsd;  // number | undefined
result.model;    // Model object with full metadata

// Streaming variant
const streamResult = await querySDKStreamDetailed(prompt, cwd, model, (delta) => {
  process.stdout.write(delta);
});

With Fallback

import { querySDKWithFallback } from "polpo/llm";

const result = await querySDKWithFallback(prompt, cwd, {
  primary: "anthropic:claude-sonnet-4-6",
  fallbacks: ["openai:gpt-4o"]
});

result.usedSpec;  // Which model was actually used
result.costUsd;   // Cost based on the model that actually ran

Cost by Provider

Reference pricing for popular models (as of catalog data):

Anthropic

Model	Input $/1M	Output $/1M	Cache Read $/1M
`claude-opus-4-6`	$15.00	$75.00	$1.50
`claude-sonnet-4-6`	$3.00	$15.00	$0.30
`claude-haiku-4-5-20251001`	$0.80	$4.00	$0.08

OpenAI

Model	Input $/1M	Output $/1M
`gpt-4o`	$2.50	$10.00
`gpt-4o-mini`	$0.15	$0.60
`o3`	$10.00	$40.00
`o4-mini`	$1.10	$4.40

Google

Model	Input $/1M	Output $/1M
`gemini-2.5-pro`	$1.25	$10.00
`gemini-2.5-flash`	$0.15	$0.60

Free Models

Model	Cost
`opencode:big-pickle`	Free
Any Ollama/vLLM local model	Free

Pricing data comes from the pi-ai catalog and may not reflect the latest changes from providers. Always check the provider’s official pricing page for the most current rates.

Cost Optimization Strategies

1. Use the Right Model for the Job

Don’t use claude-opus-4-6 for formatting tasks. Match model capability to task complexity:

{
  "agents": [
    { "name": "architect", "model": "anthropic:claude-opus-4-6" },
    { "name": "coder", "model": "anthropic:claude-sonnet-4-6" },
    { "name": "formatter", "model": "anthropic:claude-haiku-4-5-20251001" }
  ]
}

2. Use Local Models for Simple Tasks

For boilerplate, formatting, linting fixes — use a local model:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions"
    }
  },
  "agents": [
    { "name": "simple-tasks", "model": "ollama:qwen2.5-coder:7b" }
  ]
}

3. Use Fallback Chains with Cost Awareness

Put cheaper models in the fallback chain:

{
  "settings": {
    "orchestratorModel": {
      "primary": "anthropic:claude-sonnet-4-6",
      "fallbacks": [
        "google:gemini-2.5-flash",
        "groq:llama-3.3-70b-versatile"
      ]
    }
  }
}

4. Use the Model Allowlist

Restrict which models can be used to prevent accidental expensive model usage:

{
  "settings": {
    "modelAllowlist": {
      "anthropic:claude-sonnet-4-6": { "alias": "Sonnet" },
      "anthropic:claude-haiku-4-5-20251001": { "alias": "Haiku" },
      "groq:llama-3.3-70b-versatile": { "alias": "Llama" }
    }
  }
}

5. Leverage Prompt Caching

Models that support prompt caching (Anthropic, OpenAI) can significantly reduce costs for repeated similar prompts. The cache read cost is typically 90% cheaper than regular input pricing. Cache read pricing:

Model	Regular Input	Cache Read	Savings
Claude Sonnet 4.5	$3.00/1M	$0.30/1M	90%
Claude Opus 4	$15.00/1M	$1.50/1M	90%
Claude Haiku 4.5	$0.80/1M	$0.08/1M	90%

LLM Models

Providers

Cost Tracking

How Cost Tracking Works

Usage Object

Cost Calculation

Detailed vs Simple Queries

Simple (text only)

Detailed (with metadata)

With Fallback

Cost by Provider

Anthropic

OpenAI

Google

Free Models

Cost Optimization Strategies

1. Use the Right Model for the Job

2. Use Local Models for Simple Tasks

3. Use Fallback Chains with Cost Awareness

4. Use the Model Allowlist

5. Leverage Prompt Caching

LLM Models

Providers

​How Cost Tracking Works

​Usage Object

​Cost Calculation

​Detailed vs Simple Queries

​Simple (text only)

​Detailed (with metadata)

​With Fallback

​Cost by Provider

​Anthropic

​OpenAI

​Google

​Free Models

​Cost Optimization Strategies

​1. Use the Right Model for the Job

​2. Use Local Models for Simple Tasks

​3. Use Fallback Chains with Cost Awareness

​4. Use the Model Allowlist

​5. Leverage Prompt Caching

How Cost Tracking Works

Usage Object

Cost Calculation

Detailed vs Simple Queries

Simple (text only)

Detailed (with metadata)

With Fallback

Cost by Provider

Anthropic

OpenAI

Google

Free Models

Cost Optimization Strategies

1. Use the Right Model for the Job

2. Use Local Models for Simple Tasks

3. Use Fallback Chains with Cost Awareness

4. Use the Model Allowlist

5. Leverage Prompt Caching