Quick Start
Config
Use it
Auto-Discovery
Polpo can discover your Ollama models automatically withpolpo models scan:
localhost:11434 and lists all available models with their capabilities.
Recommended Models
| Model | Size | Best for |
|---|---|---|
qwen2.5-coder:32b | 32B | Best open coding model |
qwen2.5-coder:7b | 7B | Fast coding, lower quality |
llama3.1:70b | 70B | General purpose, strong reasoning |
llama3.1:8b | 8B | Fast general purpose |
deepseek-coder-v2:16b | 16B | Good code generation |
codestral:22b | 22B | Mistral’s code model |
Remote Ollama
If Ollama runs on a different machine (e.g. a GPU server):Fallback Pattern: Local First, Cloud Backup
A common pattern is to try the local model first, then fall back to a cloud provider if it’s unavailable:Provider Details
| Provider ID | ollama (custom) |
| Default port | 11434 |
| API type | openai-completions |
| Base URL | http://localhost:11434/v1 |
| API key | Not required |
| Cost | Free (runs locally) |
Troubleshooting
Connection refused
Connection refused
Make sure Ollama is running:
ollama serve. Check the port with curl http://localhost:11434/api/tags.Model not found
Model not found
Pull the model first:
ollama pull qwen2.5-coder:32b. The model ID in your config must exactly match the Ollama tag name.Out of memory
Out of memory
Large models (32B+) need significant VRAM. Check your GPU memory with
nvidia-smi. Consider using a smaller model like qwen2.5-coder:7b if you’re limited on memory.Slow inference
Slow inference
Ollama performance depends on your hardware. For faster inference, ensure you’re using GPU acceleration (CUDA/ROCm). CPU-only inference is 10-50x slower.