coolify-mcp is the layer that translates between them, and it does it with 90 to 99% less noise. Here is exactly how.
{
"id": 142,
"uuid": "g4sk4ckcw080osckos48sswo",
"name": "stuartmason.co.uk",
"description": "Marketing site + blog",
"fqdn": "https://stuartmason.co.uk,https://www.stuartmason.co.uk",
"config_hash": "a1b2c3d4e5f6...",
"git_repository": "StuMason/stuartmason",
"git_branch": "main",
"git_commit_sha": "HEAD",
"build_pack": "nixpacks",
"static_image": "nginx:alpine",
"install_command": "npm ci",
"build_command": "npm run build",
"start_command": null,
"ports_exposes": "3000",
"ports_mappings": null,
"base_directory": "/",
"publish_directory": "/",
"health_check_enabled": true,
"health_check_path": "/up",
"health_check_port": null,
"health_check_host": null,
"health_check_method": "GET",
"health_check_return_code": 200,
"health_check_scheme": "http",
"health_check_response_text": null,
"health_check_interval": 30,
"health_check_timeout": 30,
"limits_memory": "0",
"limits_memory_swap": "0",
"limits_cpus": "0",
"dockerfile": "FROM php:8.4-fpm\nRUN docker-php-ext-install ... (47KB) ...",
"docker_compose_raw": "services:\n app:\n build: . ... (3KB) ...",
"server": { "id": 1, "name": "apps", "ip": "138.199.216.202", "... 38 more fields": "..." },
"destination": { "id": 1, "network": "coolify", "... 12 more fields": "..." },
"... 53 more fields": "..."
}{
"uuid": "g4sk4ckcw080osckos48sswo",
"name": "stuartmason.co.uk",
"status": "running:healthy",
"fqdn": "https://stuartmason.co.uk",
"git": "StuMason/stuartmason@main",
"_actions": { "restart": "control(restart)", "logs": "application_logs(uuid)" }
}A two-minute walkthrough of the token-collapse problem and how coolify-mcp solves it, in the actual codebase.
Wiring an API to an agent is easy. Keeping it inside a finite, expensive context window is the part that takes judgement.
A single app listing is 91 fields, with a 47KB Dockerfile and a 3KB compose string buried inside. Sent raw, it floods the model. A two-tier projection layer returns a 5-field summary from list calls and full detail only on request. A 1MB deployment list becomes 4KB.
Every tool's schema ships to the model on every turn. Sixty granular CRUD tools cost roughly 43,000 tokens before the user says a word. Consolidating to 42 action-parameter tools cut that to 6,600. An 85% saving, every turn.
Three of the heaviest endpoints, before and after the projection layer.
{
"uuid": "g4sk4ckcw080osckos48sswo",
"name": "stuartmason.co.uk",
"status": "running:healthy",
"fqdn": "https://stuartmason.co.uk",
"git": "StuMason/stuartmason@main",
"_actions": { "restart": "control(restart)", "logs": "application_logs(uuid)" }
}What the model receives from a list call. Five fields chosen from ninety-one.
Claude or Cursor spawns the server locally. Your Coolify token is passed as an env var and never leaves your machine.
One tools/list goes to the model: 42 consolidated descriptors at ~6.6k tokens, not 60+ granular ones at ~43k.
You ask in plain English. The model picks a tool and emits one structured call.
Args are validated with Zod and routed by action enum. This layer holds no HTTP knowledge.
Composite tools (diagnose_app, find_issues) fire 4 to 8 Coolify calls in parallel via Promise.allSettled.
Responses pass a projection layer: lists collapse to 5 fields, logs paginate and truncate, secrets mask to ***.
Each payload carries HATEOAS _actions and pagination cursors, so the model knows the next valid move for free.
A 90 to 99% smaller payload reaches the model. Docs questions resolve against a local BM25 index, no extra network call.
// Six CRUD tools became one. Their schemas no longer
// ship to the model on every turn.
server.tool("application", {
action: z.enum([
"list", "get", "create",
"start", "stop", "restart",
]),
uuid: z.string().optional(),
// ...
}, async ({ action, uuid }) => {
switch (action) {
case "list": return summarise(await client.apps());
case "restart": return client.restart(resolve(uuid));
// ...
}
});tokens saved on every single turn by collapsing six tools into one action-parameter tool. Across a long agent session, that is the difference between a fast, accurate assistant and one that forgets what it was doing.
TypeScript strict, Zod-validated at every boundary.
Published as io.github.StuMason/coolify, plus a security assessment badge.
Beyond the MCP SDK, Zod and a local search index. No daemon, no database.
Env values return as *** unless reveal:true is set explicitly.
coolify-mcp is one of several production AI systems I have built and shipped. If your agency has AI work it can't staff, that is exactly where I slot in.