Polpo tracks per-call cost for every LLM interaction using pricing data from the @mariozechner/pi-ai model catalog. Every model in the catalog includes up-to-date pricing for input tokens, output tokens, cache reads, and cache writes.
How Cost Tracking Works
Every query through Polpo’s LLM layer returns usage data alongside the response:
import { querySDKTextDetailed } from "polpo/llm";
const result = await querySDKTextDetailed(
"Refactor this function for readability",
"/path/to/workdir",
"anthropic:claude-sonnet-4-6"
);
console.log(result.text); // The response
console.log(result.usage); // { inputTokens: 1250, outputTokens: 450, ... }
console.log(result.costUsd); // 0.00875 (in USD)
Usage Object
The usage field returned by pi-ai contains:
| Field | Description |
|---|
inputTokens | Number of input tokens processed |
outputTokens | Number of output tokens generated |
cacheReadTokens | Tokens served from provider cache |
cacheWriteTokens | Tokens written to provider cache |
Cost Calculation
Cost is calculated using the model’s pricing metadata:
import { estimateCost, resolveModel } from "polpo/llm";
const model = resolveModel("anthropic:claude-sonnet-4-6");
const cost = estimateCost(model, {
inputTokens: 10000,
outputTokens: 2000,
cacheReadTokens: 5000,
cacheWriteTokens: 0,
});
console.log(cost);
// → {
// inputCost: 0.03,
// outputCost: 0.03,
// cacheReadCost: 0.0015,
// cacheWriteCost: 0,
// totalCost: 0.0615,
// currency: "USD"
// }
Detailed vs Simple Queries
Polpo provides two query patterns:
Simple (text only)
import { querySDKText } from "polpo/llm";
// Returns just the text string — no usage or cost data
const text = await querySDKText(prompt, cwd, model);
import { querySDKTextDetailed, querySDKStreamDetailed } from "polpo/llm";
// Returns text + usage + cost + model metadata
const result = await querySDKTextDetailed(prompt, cwd, model);
result.text; // string
result.usage; // Usage | undefined
result.costUsd; // number | undefined
result.model; // Model object with full metadata
// Streaming variant
const streamResult = await querySDKStreamDetailed(prompt, cwd, model, (delta) => {
process.stdout.write(delta);
});
With Fallback
import { querySDKWithFallback } from "polpo/llm";
const result = await querySDKWithFallback(prompt, cwd, {
primary: "anthropic:claude-sonnet-4-6",
fallbacks: ["openai:gpt-4o"]
});
result.usedSpec; // Which model was actually used
result.costUsd; // Cost based on the model that actually ran
Cost by Provider
Reference pricing for popular models (as of catalog data):
Anthropic
| Model | Input $/1M | Output $/1M | Cache Read $/1M |
|---|
claude-opus-4-6 | $15.00 | $75.00 | $1.50 |
claude-sonnet-4-6 | $3.00 | $15.00 | $0.30 |
claude-haiku-4-5-20251001 | $0.80 | $4.00 | $0.08 |
OpenAI
| Model | Input $/1M | Output $/1M |
|---|
gpt-4o | $2.50 | $10.00 |
gpt-4o-mini | $0.15 | $0.60 |
o3 | $10.00 | $40.00 |
o4-mini | $1.10 | $4.40 |
Google
| Model | Input $/1M | Output $/1M |
|---|
gemini-2.5-pro | $1.25 | $10.00 |
gemini-2.5-flash | $0.15 | $0.60 |
Free Models
| Model | Cost |
|---|
opencode:big-pickle | Free |
| Any Ollama/vLLM local model | Free |
Pricing data comes from the pi-ai catalog and may not reflect the latest changes from providers. Always check the provider’s official pricing page for the most current rates.
Cost Optimization Strategies
1. Use the Right Model for the Job
Don’t use claude-opus-4-6 for formatting tasks. Match model capability to task complexity:
{
"agents": [
{ "name": "architect", "model": "anthropic:claude-opus-4-6" },
{ "name": "coder", "model": "anthropic:claude-sonnet-4-6" },
{ "name": "formatter", "model": "anthropic:claude-haiku-4-5-20251001" }
]
}
2. Use Local Models for Simple Tasks
For boilerplate, formatting, linting fixes — use a local model:
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions"
}
},
"agents": [
{ "name": "simple-tasks", "model": "ollama:qwen2.5-coder:7b" }
]
}
3. Use Fallback Chains with Cost Awareness
Put cheaper models in the fallback chain:
{
"settings": {
"orchestratorModel": {
"primary": "anthropic:claude-sonnet-4-6",
"fallbacks": [
"google:gemini-2.5-flash",
"groq:llama-3.3-70b-versatile"
]
}
}
}
4. Use the Model Allowlist
Restrict which models can be used to prevent accidental expensive model usage:
{
"settings": {
"modelAllowlist": {
"anthropic:claude-sonnet-4-6": { "alias": "Sonnet" },
"anthropic:claude-haiku-4-5-20251001": { "alias": "Haiku" },
"groq:llama-3.3-70b-versatile": { "alias": "Llama" }
}
}
}
5. Leverage Prompt Caching
Models that support prompt caching (Anthropic, OpenAI) can significantly reduce costs for repeated similar prompts. The cache read cost is typically 90% cheaper than regular input pricing.
Cache read pricing:
| Model | Regular Input | Cache Read | Savings |
|---|
| Claude Sonnet 4.5 | $3.00/1M | $0.30/1M | 90% |
| Claude Opus 4 | $15.00/1M | $1.50/1M | 90% |
| Claude Haiku 4.5 | $0.80/1M | $0.08/1M | 90% |