Core Concepts

Polpo is an AI agent that manages a team of other AI agents. You talk to Polpo, and it plans, delegates, reviews, and delivers. Here are the core primitives.

Agents

An agent is a named AI worker. You give it a model, a role, and optionally constrain which tools and files it can access. Agents don’t know about each other — Polpo handles all coordination. Each agent runs as a detached subprocess — if Polpo crashes, agents keep working. On restart, Polpo reconnects to live processes.

{
  "name": "backend-dev",
  "role": "developer",
  "model": "anthropic:claude-sonnet-4-20250514",
  "allowedTools": ["read", "write", "edit", "bash", "glob", "grep"],
  "allowedPaths": ["./src", "./tests"]
}

See Agents for the full config reference.

Teams

Every Polpo project has a team — the group of agents available to work on tasks. Even a single-agent setup requires a team definition. polpo init creates one automatically with a default agent.

{
  "team": {
    "name": "api-builders",
    "agents": [
      { "name": "backend", "model": "anthropic:claude-sonnet-4-20250514", "role": "developer" },
      { "name": "reviewer", "model": "openai:gpt-4o", "role": "code reviewer" }
    ]
  }
}

Missions can also define volatile agents — temporary team members that exist only for the duration of that mission and are cleaned up automatically. See Teams for management, volatile teams, and runtime API.

Tasks

A task is a unit of work assigned to an agent. Every task moves through a state machine:

pending → assigned → in_progress → review → done / failed

When a task fails assessment, it enters a fix phase — the agent receives targeted feedback (per-dimension scores and reasoning) and corrects its work without starting over. After exhausting fix attempts, a full retry kicks in. After exhausting retries, escalation takes over. Tasks can also pass through awaiting_approval (when approval gates are configured) and clarification (when an agent asks a question that Polpo answers automatically). See Tasks for states, phases, and transitions.

Missions

A mission is a group of tasks with dependencies. You can ask Polpo to generate one from a description, or define them as JSON files in .polpo/missions/. Polpo resolves the dependency graph, parallelizes what it can, and sequences the rest.

{
  "group": "feature-auth",
  "tasks": [
    { "title": "Design schema", "assignTo": "backend" },
    { "title": "Implement API", "assignTo": "backend", "dependsOn": ["Design schema"] },
    { "title": "Write tests", "assignTo": "reviewer", "dependsOn": ["Implement API"] }
  ]
}

Missions can include quality gates, checkpoints, scheduling, deadlines, and volatile teams scoped to the mission. See Missions for the full format and lifecycle.

Assessment

Every completed task goes through Polpo’s G-Eval assessment system — an LLM-as-judge pipeline that scores agent work across multiple dimensions. Three independent reviewer agents evaluate the output in parallel, each with access to the codebase via tools (read_file, glob, grep). They score four default dimensions:

Dimension	Weight	What it measures
Correctness	35%	Logic, runtime behavior, no regressions
Completeness	30%	All requirements addressed
Code quality	20%	Structure, readability, maintainability
Edge cases	15%	Error handling, boundary conditions

Polpo computes a consensus score (median across reviewers, outliers excluded) on a 1–5 scale. If the score is below the threshold (default: 3.0), the task enters a fix phase with specific feedback — not a blind retry. You can also define concrete expectations per task: test commands to run, files that must exist, or custom LLM review prompts — each with a weight that contributes to the final score. See Review for the pipeline details and Scoring for dimensions, rubrics, and quality gates.

Sessions

Polpo maintains persistent chat sessions — every conversation (CLI, TUI, Web UI, Telegram, API) is stored and can be resumed within a configurable idle window (default: 30 minutes). See Sessions for persistence, transcripts, and activity tracking.

Memory

Polpo has two levels of persistent context that carry across sessions:

Project memory (.polpo/memory.md) — facts about your project that Polpo and its agents should always know (architecture decisions, conventions, key file locations). Agents receive this as context when they start working.
System context (.polpo/system-context.md) — standing instructions for Polpo itself (behavior preferences, team policies, escalation rules). Injected into every Polpo conversation.

You can edit memory via chat (“remember that we use PostgreSQL”), the TUI (/memory), or the API. See Memory for configuration and usage.

Overview

First Steps

Usage

Deploy

Core Concepts

Agents

Teams

Tasks

Missions

Assessment

Sessions

Memory

Overview

First Steps

Usage

Deploy

​Agents

​Teams

​Tasks

​Missions

​Assessment

​Sessions

​Memory

Agents

Teams

Tasks

Missions

Assessment

Sessions

Memory