> ## Documentation Index
> Fetch the complete documentation index at: https://docs.clawker.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Control Plane

> The clawker control plane — what it is, what it does, and how to interact with it

The clawker **control plane** (CP) is a long-lived, privileged Go service that runs as `cmd/clawkercp` (PID 1) inside the `clawker-controlplane` Docker container. It is the authoritative supervisor for every clawker-managed agent on the host — it owns the agent identity registry, the egress firewall lifecycle, the eBPF program lifetime, and the CP↔agent command channel.

You normally won't think about the control plane. The first time any clawker command needs it (`clawker firewall status`, `clawker run`, `clawker controlplane agents`, …), the CLI brings it up transparently. The `clawker controlplane` verb group exists for debugging, upgrades, and recovery — not day-to-day use.

<Note>
  The control plane is **not** the firewall. The firewall (Envoy + CoreDNS + eBPF) is one of several subsystems CP manages. Disabling the firewall via `settings.yaml` does **not** disable the control plane — CP, mTLS, and the agent registry continue to run for any other clawker container. See the [Firewall](/firewall) guide for the firewall itself.
</Note>

## What CP Does

The CP container is a single binary, `clawkercp`, running as PID 1. Inside it:

* **Ory auth stack** — Hydra (OAuth2 token issuer, `client_credentials` + `private_key_jwt` ES256), Kratos (identity), and Oathkeeper (HTTP auth proxy) are subprocess-managed by the same PID. Token validation is fail-closed.
* **AdminService gRPC** (mTLS + OAuth2 JWT, default port `7443` on host loopback) — the 13-method firewall control surface (`FirewallInit`, `FirewallEnable`, `FirewallAddRules`, `FirewallSyncRoutes`, `FirewallBypass`, …) plus `ListAgents` and `GetSystemTime` (public-scope, no bearer token required — used by the clock-sync readiness gate). Every CLI `clawker firewall *` and `clawker controlplane agents` call goes through this RPC.
* **AgentService gRPC** (mTLS, default in-container port `7444`, reachable only over `clawker-net`) — the surface clawkerd uses to register itself with CP and hold open a long-lived Session.
* **Agent registry** — a sqlite database persisted on the host XDG data dir, keyed by SHA-256 of the agent's mTLS leaf cert thumbprint plus container ID. CP is the **sole** writer; reads go through `ListAgents`. The registry survives CP restarts.
* **Overseer event bus + worldview** — an in-process typed pub/sub serializing container lifecycle (start/stop/destroy/rename), agent session lifecycle (connecting/connected/failed/broken), and trust verdict events into a deep-copyable `State` snapshot.
* **Docker events feeder** — subscribes to the local Docker daemon's event stream (with reconnect), projects managed-label-filtered events onto the overseer bus.
* **Agent watcher + clean self-shutdown** — polls Docker every 30s for `purpose=agent, managed=true` containers. After drain-to-zero (60s grace period elapsed AND 2 consecutive zero-count polls), it fires an ordered drain callback: `actionQueue.Close` → graceful gRPC stop → cancel bypass timers → Stack stop → `netlogger.Stop` (drains the eBPF egress event pipeline and flushes the OTLP BatchProcessor BEFORE BPF maps go away) → DNS GC stop → eBPF `FlushAll` → exit code 0. The `on-failure` restart policy does not retrigger.
* **eBPF egress event emitter (netlogger)** — drains a BPF ringbuf populated at every cgroup/connect/sendmsg/sock\_create decision and emits OTLP log records on the same mTLS-gated infra lane the CP zerolog bridge uses. Distinct `service.name=ebpf-egress` so OpenSearch routes the stream to its own index. Degrades to `event=netlogger_unavailable` (no panics) when the collector is unreachable; firewall enforcement is unaffected. See [Egress Observability](/observability) for the record shape.
* **Aggregate `/healthz`** — host-loopback HTTP on `HealthPort` (default `7080`) probes every internal service port before returning 200. Used by both `clawker controlplane status` and the host-side bootstrap to confirm readiness.

## Container Privileges

The CP container runs with elevated permissions because it is the host-side supervisor that loads kernel-attached eBPF programs and brings up the sibling firewall containers (Envoy, CoreDNS) on your behalf. The full privilege set is:

| Privilege                                         | Why CP needs it                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| ------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CAP_SYS_ADMIN`                                   | Required to attach cgroup-bound eBPF programs to agent containers' cgroups and to mount the bpffs pin path.                                                                                                                                                                                                                                                                                                                                                                                                       |
| `CAP_BPF`                                         | Required to load BPF programs and create/update pinned BPF maps.                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| `/sys/fs/bpf` (RW bind mount)                     | Where BPF programs and maps are pinned. Pins survive across CP restarts (they're attached to cgroups, not CP's process). CP needs RW to mount the `clawker` subdirectory at boot, sync routes on rule changes, and flush state during clean shutdown.                                                                                                                                                                                                                                                             |
| `/sys/fs/cgroup` (RO bind mount)                  | Required to enumerate agent containers' cgroup paths so CP can attach BPF programs to the right cgroup.                                                                                                                                                                                                                                                                                                                                                                                                           |
| `/var/run/docker.sock` (bind mount, RO file flag) | CP subscribes to Docker events (container start/stop/destroy) and brings up the Envoy/CoreDNS sibling containers via the Docker API. The RO flag only prevents the socket file itself from being replaced inside the container — once the socket is open, the Docker daemon honors any API call over it, so CP has full Docker control regardless of the mount flag.                                                                                                                                              |
| `apparmor=unconfined`                             | Docker's `docker-default` AppArmor profile denies writes under `/sys/fs/**` (except `/sys/fs/cgroup/**`), which blocks `mkdir /sys/fs/bpf/clawker` at eBPF load even with `CAP_BPF + CAP_SYS_ADMIN`. CP runs unconfined so the bpffs pin path is writable. This mirrors the upstream cilium-agent posture (`appArmorProfile.type: Unconfined`). Defense-in-depth on the CP container relies on the docker-default seccomp profile, namespaces, masked `/proc` paths, and `no-new-privileges` — all still applied. |

**These privileges are not extended to agent containers.** The agent container itself runs fully unprivileged: `cap_add: []`, the docker-default seccomp and AppArmor profiles, no `/sys/fs/bpf` or `/sys/fs/cgroup` mount, no Docker socket. The agent's blast radius is bounded by its own container; CP's privileges exist only to *enforce* that boundary, not to relax it.

<Warning>
  CP is the privileged side of the security boundary. Anything with write access to the CP binary, the embedded `ebpf-manager`, the mounted CA, or the AdminService server certs can subvert firewall enforcement for every agent container on the host. Apply the [release verification](/threat-model#verifying-a-release) procedure to any clawker upgrade, and do not bind-mount additional host paths into the CP container via local modifications.
</Warning>

## Guarantees

1. **eBPF programs have a deterministic owner.** BPF cgroup programs and pinned maps survive the CP container's death (they're under `/sys/fs/bpf`). Without a supervisor, rule changes would silently fail and bypass timers would never expire. CP is the single owner — its drain callback is the only clean exit path that detaches and flushes eBPF state.
2. **Agent identity is auditable.** Every clawkerd instance binds itself to CP via mTLS Register before any privileged operation. The cert thumbprint is captured server-side from the live TLS handshake — agents cannot self-attest. `clawker controlplane agents` lists every binding, including which container holds which identity.
3. **Containment is real.** Because CP holds a long-lived Session to every agent's `clawkerd`, it can dispatch commands (init steps, MCP setup, shutdown signals) into a compromised container without re-authenticating each time.
4. **Auth is centralized.** Hydra issues short-lived OAuth2 tokens for every CLI↔CP gRPC call, signed by the CLI-issued auth material. The CLI is the root of trust; CP only validates.

<Warning>
  **CP crashing is a security incident, not an availability one.** If the CP container panics or exits uncleanly, the eBPF programs it attached remain pinned to your agent containers' cgroups — traffic keeps getting filtered by whatever rules were loaded at crash time, but no new rules can be applied, no bypass timers can expire, and no CP↔agent dispatch is available. Run `clawker controlplane status` if you suspect something is wrong; a `Container: stopped` result with agents still up means you should `clawker controlplane up` to re-establish supervision.
</Warning>

## How CP Boots

Two paths bring CP up:

1. **Transparent bootstrap** — the first CLI call that needs CP (most firewall commands, container creation, anything that opens an `AdminClient`) runs `cpboot.EnsureRunning` under a host-side mutex. Steps: ensure the CP image exists with a content-derived tag (`clawker-controlplane:bin-<sha>`, built on demand from the embedded binaries), `ContainerCreate` on `clawker-net` with a static IP, `ContainerStart`, then poll `http://127.0.0.1:<HealthPort>/healthz` until 200 or timeout. Idempotent — re-runs are no-ops once `/healthz` is green.
2. **Break-glass** — `clawker controlplane up` calls the same `EnsureRunning` path explicitly, useful when you want to bring CP up without triggering a side-effect command.

Either way, the CP image is built from binaries embedded in the `clawker` CLI itself (`clawkercp`, `ebpf-manager`). There's no separate image to pull. See [Installation](/installation) for the BPF toolchain requirements when building from source.

On every boot, CP reads `firewall.enable` from settings and — when enabled (the default) — starts the Envoy + CoreDNS firewall stack before reporting ready, so a green `/healthz` means the firewall is actually enforcing. That covers boots no CLI command observes, like Docker's restart policy resurrecting a crashed CP. A failed stack bringup **fails CP startup** (the container exits non-zero): running half-protected would leave agents either unusable (their egress redirected at a dead proxy) or, worse, silently unenforced while you believe the firewall is on. The CLI surfaces the exit with a pointer at `docker logs clawker-controlplane`; fix the cause and rerun, or disable the firewall in settings to run unprotected.

### Networking

CP joins `clawker-net` with a deterministic static IP computed by replacing the gateway's last octet with `202` — so e.g. `192.168.215.202` on a default Docker bridge with gateway `192.168.215.1`. The CLI talks to it over host loopback for `AdminClient` (mTLS gRPC on port `7443`) and `/healthz` (plain HTTP on port `7080`). The agent listener (`7444`) is **only** reachable from other containers on `clawker-net`.

When CP brings up the firewall, it places Envoy at `<network>.200` and CoreDNS at `<network>.201` on the same network (last-octet replacement, same scheme). Agent containers join `clawker-net` with `--dns` pointing at CoreDNS so DNS resolution is filtered from the very first lookup.

## CLI Surface

All `clawker controlplane` subcommands are break-glass — useful for debugging, upgrades, and recovery, not normal use.

| Command                       | Purpose                                                                                                                                                                                                                                                |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `clawker controlplane up`     | Idempotent `EnsureRunning`. Brings CP up if it isn't already; no-op if `/healthz` is green. When the firewall is enabled in settings (`firewall.enable`, the default), also brings the Envoy + CoreDNS firewall stack up and waits until it's healthy. |
| `clawker controlplane down`   | Stops the CP container. `clawkercp`'s SIGTERM handler runs the clean drain (`actionQueue.Close` → graceful gRPC stop → bypass timer cancel → Stack stop → netlogger stop → eBPF flush → exit 0).                                                       |
| `clawker controlplane status` | Probes `/healthz`; if up, also fetches firewall subsystem state via the AdminService. Output via `--format json` for scripts.                                                                                                                          |
| `clawker controlplane agents` | Lists every agent currently registered with CP — composite (`project`, `agent_name`) plus container ID, cert thumbprint, registration time, and last-seen time. Output via `--format json` for scripts.                                                |

The `clawker auth` group manages the CLI-side auth material CP depends on:

| Command               | Purpose                                                                                                                                              |
| --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `clawker auth rotate` | Regenerates the CA, server certs, and OAuth2 signing key bind-mounted into CP. Use when rotating keys, after a key compromise, or when reinstalling. |

See the full reference: [clawker controlplane](/cli-reference/clawker_controlplane), [clawker auth](/cli-reference/clawker_auth).

## Verifying CP Is Up

```bash theme={"dark"}
clawker controlplane status
```

```text theme={"dark"}
Container:  running
Healthz:    ✓
Firewall:   ✓
Rules:      12 active
```

If the CP container is not running, the CLI reports `Container: stopped` and the health and firewall fields are omitted. Bringing CP back up:

```bash theme={"dark"}
clawker controlplane up
```

To list the agents currently bound to CP:

```bash theme={"dark"}
clawker controlplane agents
```

```text theme={"dark"}
AGENT   PROJECT  CONTAINER     THUMBPRINT     REGISTERED            LAST SEEN
dev     myapp    a1b2c3d4e5f6  9f8e7d6c5b4a   2026-05-12T09:14:02Z  2026-05-12T09:42:18Z
review  myapp    7890abcdef12  1234567890ab   2026-05-12T09:14:05Z  2026-05-12T09:42:18Z
```

## Settings

CP-related ports and behavior live under `control_plane:` in `settings.yaml` (`~/.config/clawker/settings.yaml`). See [Configuration → control\_plane](/configuration#control_plane) for the schema. The defaults work out of the box; override only if a port conflicts:

```yaml theme={"dark"}
control_plane:
  admin_port: 7443        # CLI ↔ CP gRPC (host loopback, mTLS + OAuth2)
  health_port: 7080       # CLI ↔ CP /healthz (host loopback, plain HTTP)
  agent_port: 7444        # clawkerd ↔ CP gRPC (clawker-net only, mTLS)
  hydra_public_port: 4444
  hydra_admin_port: 4445
  oathkeeper_port: 4456
  oathkeeper_api_port: 4457
  kratos_public_port: 4433
  kratos_admin_port: 4434
```

The Ory admin and API ports (`hydra_admin_port`, `kratos_public_port`, `kratos_admin_port`, `oathkeeper_api_port`) are **container-internal** — they are not published to the host. `hydra_public_port` and `oathkeeper_port` are published to `127.0.0.1` on the host. All ports appear in settings so the in-container subprocesses agree on their port assignments.

## Troubleshooting

**CP container won't start.**
Run `docker logs clawker-controlplane` (CP panic traces and Ory subprocess output land here, not in clawker's rotating logs). The most common causes: stale port bindings from a half-killed previous run (`clawker controlplane down` then retry), or auth material out of sync (try `clawker auth rotate`).

**`clawker run` / `clawker container start` fails with `cp clock sync deadline exceeded` (Docker Desktop).**
Before *starting* a container, the CLI brings the control plane to full readiness — which includes waiting until CP's clock (the Docker Desktop LinuxKit VM clock, where CP runs) has caught up to the host clock. The gate requires full convergence (zero leeway): CP's clock must reach the host instant. If it doesn't converge within \~30s the command fails with:

```text theme={"dark"}
starting container: bootstrapping services: ensuring control plane is running: cp clock sync deadline exceeded
```

This almost always happens after the host sleeps and wakes: the VM clock drifts behind real time until its NTP source re-syncs. The bootstrap assertion is minted in the **host** clock, while Hydra validates its `iat` against CP's clock with zero leeway — so exchanging it against a still-lagging CP clock would earn a Hydra `Token used before issued` 500. Rather than re-mint or skew-correct, the gate **waits** for the CP clock to reach the host before letting the container start, so nothing unusable is baked in: the baked assertion stays valid, and once the VM clock catches up a plain retry of `clawker run` / `clawker container start` succeeds — no need to delete and recreate. Wait a few seconds for the VM clock to catch up and retry, or restart Docker Desktop to force a resync. (The earlier symptom of this drift — that Hydra `Token used before issued` 500 at container start — is now caught by this pre-start gate instead.)

**`clawker firewall *` commands hang or fail with `connection refused`.**
CP isn't running or `/healthz` is not green. `clawker controlplane status` confirms. `clawker controlplane up` brings it back.

**Agents appear in `clawker ps` but not in `clawker controlplane agents`.**
The agent's `clawkerd` hasn't completed the Register handshake with CP — either CP wasn't running when the container started, or the agent's mTLS material is invalid. `docker logs clawker.<project>.<agent>` (look for `event=register_failed` or TLS handshake errors) and `clawker auth rotate` are the typical recovery steps.

**Want to know what CP is up to in real time.**
The host-side CP log file is `~/.local/state/clawker/logs/clawker-controlplane.log` (rotated). Stack traces from a CP panic land on the CP container's stderr (`docker logs clawker-controlplane`), **not** in this file — so if the rotating log is silent but agents are misbehaving, check `docker logs` first.

## See Also

* [Firewall](/firewall) — egress enforcement (one of CP's managed subsystems)
* [Container Internals](/container-internals) — what `clawkerd` (PID 1 inside each agent container) does, and how it talks to CP
* [Credentials](/credentials) — credential forwarding mechanisms unrelated to CP
* [`clawker controlplane`](/cli-reference/clawker_controlplane) — full CLI reference
* [`clawker auth`](/cli-reference/clawker_auth) — auth material rotation
