Ollama

Ollama makes it easy to run open-source models on your own machine. No API key, no cost, full data privacy. Polpo connects to Ollama’s OpenAI-compatible endpoint.

Quick Start

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull qwen2.5-coder:32b

# Ollama is now serving on localhost:11434

Config

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "models": [
        {
          "id": "qwen2.5-coder:32b",
          "name": "Qwen 2.5 Coder 32B",
          "contextWindow": 131072,
          "maxTokens": 8192,
          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

Use it

{
  "agents": [
    { "name": "local-coder", "model": "ollama:qwen2.5-coder:32b" }
  ]
}

Auto-Discovery

Polpo can discover your Ollama models automatically with polpo models scan:

polpo models scan

This scans localhost:11434 and lists all available models with their capabilities.

Recommended Models

Model	Size	Best for
`qwen2.5-coder:32b`	32B	Best open coding model
`qwen2.5-coder:7b`	7B	Fast coding, lower quality
`llama3.1:70b`	70B	General purpose, strong reasoning
`llama3.1:8b`	8B	Fast general purpose
`deepseek-coder-v2:16b`	16B	Good code generation
`codestral:22b`	22B	Mistral’s code model

Remote Ollama

If Ollama runs on a different machine (e.g. a GPU server):

{
  "providers": {
    "ollama": {
      "baseUrl": "http://gpu-server:11434/v1",
      "api": "openai-completions",
      "models": [...]
    }
  }
}

Fallback Pattern: Local First, Cloud Backup

A common pattern is to try the local model first, then fall back to a cloud provider if it’s unavailable:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "models": [
        { "id": "qwen2.5-coder:32b", "name": "Qwen 2.5 Coder 32B", "contextWindow": 131072, "maxTokens": 8192 }
      ]
    },
    "anthropic": "${ANTHROPIC_API_KEY}"
  },
  "settings": {
    "orchestratorModel": {
      "primary": "ollama:qwen2.5-coder:32b",
      "fallbacks": ["anthropic:claude-sonnet-4-6"]
    }
  }
}

Provider Details


Provider ID	`ollama` (custom)
Default port	`11434`
API type	`openai-completions`
Base URL	`http://localhost:11434/v1`
API key	Not required
Cost	Free (runs locally)

Troubleshooting

Connection refused

Make sure Ollama is running: ollama serve. Check the port with curl http://localhost:11434/api/tags.

Model not found

Pull the model first: ollama pull qwen2.5-coder:32b. The model ID in your config must exactly match the Ollama tag name.

Out of memory

Large models (32B+) need significant VRAM. Check your GPU memory with nvidia-smi. Consider using a smaller model like qwen2.5-coder:7b if you’re limited on memory.

Slow inference

Ollama performance depends on your hardware. For faster inference, ensure you’re using GPU acceleration (CUDA/ROCm). CPU-only inference is 10-50x slower.

The /v1 suffix in the base URL is required — it enables Ollama’s OpenAI-compatible API endpoint. Without it, Polpo can’t communicate with Ollama.

LLM Models

Providers

Quick Start

Config

Use it

Auto-Discovery

Recommended Models

Remote Ollama

Fallback Pattern: Local First, Cloud Backup

Provider Details

Troubleshooting

LLM Models

Providers

​Quick Start

​Config

​Use it

​Auto-Discovery

​Recommended Models

​Remote Ollama

​Fallback Pattern: Local First, Cloud Backup

​Provider Details

​Troubleshooting

Quick Start

Config

Use it

Auto-Discovery

Recommended Models

Remote Ollama

Fallback Pattern: Local First, Cloud Backup

Provider Details

Troubleshooting