How Custom Providers Work
A custom provider is any entry in theproviders config that includes a baseUrl and optionally an api compatibility mode and models list:
API Compatibility Modes
Custom endpoints must be compatible with one of these API formats:| Mode | Description | Compatible With |
|---|---|---|
openai-completions | OpenAI Chat Completions API (/v1/chat/completions) | Ollama, vLLM, LM Studio, LiteLLM, text-generation-inference, LocalAI, FastChat |
openai-responses | OpenAI Responses API (newer format) | OpenAI-direct, some proxies |
anthropic-messages | Anthropic Messages API (/v1/messages) | Anthropic proxies, AWS Bedrock wrappers |
api, Polpo uses the provider’s default. For custom providers, you almost always want openai-completions.
Custom Model Definitions
Since custom providers aren’t in the pi-ai catalog, Polpo needs model metadata to be defined inline:models, Polpo will still route requests to the endpoint — but cost tracking and model metadata won’t be available.
Ollama
Ollama serves local models with an OpenAI-compatible API.Setup
Configuration
Recommended Ollama Models
| Model | Size | Best For |
|---|---|---|
qwen2.5-coder:32b | 32B | Best open coding model |
qwen2.5-coder:7b | 7B | Fast coding, lower quality |
llama3.1:70b | 70B | General purpose, strong reasoning |
llama3.1:8b | 8B | Fast general purpose |
deepseek-coder-v2:16b | 16B | Good code generation |
codestral:22b | 22B | Mistral’s code model |
vLLM
vLLM is a high-throughput inference engine with OpenAI-compatible serving.Setup
Configuration
With vLLM, the model ID must match the exact model name you used when starting the server. Check
vllm serve --help for serving options.LM Studio
LM Studio provides a GUI for running local models with an OpenAI-compatible server.Setup
- Download and install LM Studio
- Load a model in the GUI
- Start the local server (Settings > Local Server > Start)
Configuration
LiteLLM Proxy
LiteLLM is a proxy that unifies 100+ LLM providers behind a single OpenAI-compatible endpoint.Configuration
text-generation-inference (TGI)
Hugging Face’s TGI serves models with OpenAI-compatible endpoints.Configuration
Using Custom Providers with Fallback
Custom providers work with fallback chains. A common pattern is to try a local model first, then fall back to a cloud provider:Tips
Model IDs must match exactly
Model IDs must match exactly
The
id in your custom model definition must exactly match what the server expects. For Ollama, it’s the tag name (qwen2.5-coder:32b). For vLLM, it’s the full HuggingFace model path (Qwen/Qwen2.5-Coder-32B-Instruct).Cost tracking for local models
Cost tracking for local models
Set all cost fields to
0 for local models. This prevents cost tracking from showing misleading numbers. If you’re running on rented GPU infrastructure, you can estimate cost per token and fill in the values for accurate tracking.Context window matters
Context window matters
Set
contextWindow accurately for your model. Polpo uses this to decide whether to truncate prompts. If you set it too high, the model may receive prompts that exceed its actual capacity and produce errors.Vision support
Vision support
If your local model supports images (e.g. LLaVA, Qwen-VL), set
input: ["text", "image"] in the model definition and configure it as your imageModel.