Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.clawker.dev/llms.txt

Use this file to discover all available pages before exploring further.

When the firewall is enabled, clawker’s control plane runs netlogger — a userspace pipeline that drains an eBPF ringbuf populated by the cgroup/connect/sendmsg/sock_create programs and emits one OTLP log record per egress decision. Every connect, sendmsg, and socket-create call from a managed agent container produces a record carrying the kernel’s verdict (allowed / denied / bypassed), the container’s attribution (agent, project, container_id), the destination 4-tuple, and the resolved domain when DNS context is available. The headline use is bypass-mode forensic coverage. The firewall’s bypass switch (clawker firewall bypass <duration> --agent <name>) intentionally short-circuits enforcement so the operator can perform supervised exploration without rule churn. Before netlogger, bypassed traffic flowed without leaving an enforcement record — the so-called “forensic black hole” of bypass mode. netlogger emits a verdict=bypassed record at the same decision points that would have emitted allowed/denied, so an audit trail exists for every bypass window without changing enforcement semantics.

Record Shape

Each record is an OTel log emitted on the trusted infra OTLP lane with:
  • service.name = ebpf-egress (distinct from clawker-cp so retention + volume profile are independent)
  • event.name = ebpf.egress.connect / ebpf.egress.sendmsg / ebpf.egress.sock_create (per-emit-site so dashboards can filter by record kind without inspecting flag bits)
  • body = "ebpf egress"
  • severity = INFO
Attributes carried on each record (most are unconditional — empty strings and zero numbers ship verbatim; dst_ip / dst_port / dst_host are omitted when their source value is absent so operators can partition via _exists_:attributes.<key> in OS Discover):
AttributeTypeDescription
verdictkeywordallowed, denied, or bypassed
container_idkeywordDocker container ID (empty if cgroup_id not yet in label cache)
agentkeyworddev.clawker.agent label (empty if cache miss)
projectkeyworddev.clawker.project label (empty for global-scope agents or cache miss)
cgroup_idkeywordKernel cgroup ID — trust anchor used for attribution lookup
bpf_ts_nslongKernel monotonic timestamp at the moment of decision (bpf_ktime_get_ns)
dst_ipipDestination address — IPv4 dotted-quad or IPv6 colon form. Mapped as type: ip, accepts both. Omitted on sock_create records (no_dst=true); operators filter via NOT _exists_:attributes.dst_ip
dst_portkeywordDestination port (host byte order). Omitted on sock_create records (no_dst=true)
l4_protokeywordstream / dgram / raw (human-readable form of SOCK_*)
l4_proto_codeintegerRaw SOCK_* constant in case operators need to filter on a code that doesn’t have a string form yet
ipv6booleanNative IPv6 destination — full 16-byte v6 address carried in dst_ip. Denied by default (only allowed during a bypass)
ipv4_mappedboolean::ffff:x.x.x.x IPv4-mapped IPv6 address (the dual-stack default for most clients)
no_dstbooleanSocket-creation event with no destination (sock_create program). dst_ip and dst_port are omitted on these records
dst_hostkeywordResolved domain string. Populated for every record whose destination IP was resolved via the managed CoreDNS under a firewall-allowed zone. Omitted for direct-IP connects (operators filter via NOT _exists_:attributes.dst_host; see Domain Resolution)
domain_hashkeywordBPF-side identity (FNV-1a of normalized domain). Correlates userspace records with BPF dns_cache / route_map entries when dst_host is empty (direct-IP connect, rule removed mid-flight, stale dnsbpf entry)
The BPF event struct is fixed-size with explicit padding (48 bytes; layout asserted at compile time). Operators filter and aggregate at dashboard/query time — the emitter never decides which fields are “interesting” for a given verdict, except for the sock_create carve-out above where BPF carries no destination.

Where Records Land

Records flow:
agent container egress decision


  BPF cgroup program (events_ringbuf)


  netlogger (CP-side userspace)
        │  enrich {cgroup_id → container_id, agent, project, domain}

  OTLP/gRPC + mTLS via netlogger's *sdklog.LoggerProvider
   (built by controlplane.NewOtelLoggerProvider —
    generic factory reusable for any future CP-side
    OTLP log emitter)


  otel-collector trusted lane (otlp/infra receiver)


  routing/trusted connector (service.name=ebpf-egress)


  OpenSearch index: clawker-ebpf-egress
The index is preconfigured by the clawker-opensearch-bootstrap one-shot service every time clawker monitor up runs. The retention policy (default 7 days, throwaway-stack semantics) auto-attaches via the same ISM policy that covers the other clawker indices. Cross-index queries against clawker-cp,clawker-envoy,clawker-coredns,clawker-ebpf-egress work out of the box — ingest_source is stamped on every record for filtering.

Per-Connection Bytes and Duration

Not in this stream. netlogger records the decision — the moment the kernel approved, denied, or bypassed an outbound connection. Byte counts and durations belong to the L7 proxy lifecycle, not the decision point. For verdict=allowed records, the matching Envoy access log carries bytes_sent, bytes_received, duration_ms. Operators pivot from a netlogger record to the corresponding Envoy record by 5-tuple at query time. For verdict=denied records there are no bytes to record — no traffic flowed. For verdict=bypassed records, only the netlogger record exists — Envoy and CoreDNS enforcement are skipped under bypass by design. Sock_ops-based per-connection byte tracking inside BPF is not on this stream’s roadmap. It would double the BPF surface area, leave UDP/connectionless flows without an analogous signal, and overlap with Envoy’s access-log emission for the cases where it matters.

Domain Resolution

dst_host is populated for every record whose destination IP came from a dnsbpf-resolved A record under a firewall-allowed zone. The translation is control-plane-driven: the BPF dns_cache map stores {domain_hash, expire_ts} keyed by IPv4, and netlogger’s reverse-DNS map maintains the inverse hash → domain table by hashing the live set of firewall rule destinations + internal hosts (docker.internal + monitoring service hostnames) on a 5-second refresh tick. The hash function (internal/controlplane/firewall/ebpf.DomainHash — FNV-1a) is the same one dnsbpf computes when it writes dns_cache, so the two sides agree on the identity by construction. dst_host will be empty when:
  • The destination IP was reached without DNS resolution (direct-IP connect).
  • The IP was resolved through a path other than the managed CoreDNS (e.g., /etc/hosts entry inside the agent container).
  • A rule was removed and netlogger hasn’t yet refreshed (worst case: 5 seconds of stale records on the previously-allowed domain).
The hash space is 32-bit FNV-1a, which carries a theoretical collision floor. The route-identity-allocator follow-up replaces the hash with userspace-allocated sequential identities, matching Cilium’s pkg/fqdn/namemanager pattern. At deployment-typical rule-set sizes (single-digit-to-hundreds of firewall-rule domains), the floor is operationally irrelevant.

Reliability

netlogger is engineered to fail open with respect to the firewall — enforcement runs whether or not netlogger is healthy.
  • BPF token-bucket rate limiter keyed by cgroup_id (burst 64, refill 64 tokens/100ms ⇒ ~640 records/sec/cgroup ceiling). A misbehaving container cannot monopolize the ringbuf; throttled events are counted in ratelimit_drops, keyed by the noisy cgroup.
  • Kernel-fault drop counter (events_drops, PERCPU_ARRAY) bumps when bpf_ringbuf_reserve returns NULL on a full buffer — distinct from rate-limit drops so the operator response is different (ringbuf size vs. noisy-agent triage).
  • Userspace queue between the ringbuf reader and the processor is bounded with drop-newest semantics; the reader never blocks on the consumer. Drops are counted in clawker_netlogger_queue_dropped_total (Prom counter declared; scrape exposure is not wired).
  • Circuit breaker wraps the OTLP exporter: three consecutive Export() failures permanently trip the breaker for the rest of the CP lifetime. Records drop on the floor afterward; the BatchProcessor queue drains via the SDK’s own drop-oldest path. No background reconnect — telemetry availability is binary per-CP-lifetime by design. Operator response: restart CP after fixing the collector.
  • Preflight TLS dial runs at CP boot with a 20-second deadline against the configured OTLP endpoint. Failure degrades netlogger to a no-op for the rest of the CP lifetime and emits event=netlogger_unavailable (warn for “no endpoint configured”, error for actual failures like cert problems or unreachable collector). The firewall, AdminService, agent dispatch, and registry are unaffected — netlogger’s failure is contained.
If netlogger is degraded, the kernel records continue to land in the ringbuf and are dropped on the floor — the BPF programs themselves are unaffected, no enforcement decision is missed, and the BPF token-bucket prevents unbounded buildup of pinned ringbuf records.

Trust Lane

netlogger emits on the trusted infra lane — the same OTLP/gRPC + mTLS path the CP zerolog bridge, the Envoy access logger, and the CoreDNS otel plugin use. Identity reuse:
  • Cert: per-handshake ephemeral leaf minted by otelcerts.Service (LoadTLSConfig("netlogger")), chained through the infra intermediate CA — not the CLI root.
  • Endpoint: the collector’s otlp/infra receiver on OtelInfraPort (not the unauth’d otel-collector:4317 agent lane).
  • The OTLP endpoint must be https:// (or bare host:port). A plaintext http:// endpoint is rejected at boot — pushing infra telemetry over plaintext would smuggle records onto the agent-lane receiver, defeating the trust-lane separation.
Agent containers cannot forge service.name=ebpf-egress records onto the trusted index — they don’t hold a leaf chained through the infra intermediate, so the receiver’s TLS handshake fails the chain check. The strict-directive promise (every field on every record, no discretion) only delivers if the records actually originate from the CP — the mTLS boundary is what makes that promise enforceable.

Configuration

netlogger inherits its endpoint from the standard OTEL_EXPORTER_OTLP_ENDPOINT resolution path used by the CP zerolog bridge — when clawker monitor up is running, the CP boot sequence wires the collector’s OtelInfraPort automatically. No netlogger-specific knobs ship; the BatchProcessor sizing, retry cap (10s vs. SDK default 1 min), and circuit-breaker threshold are CP-level constants. To point netlogger at a custom collector (for a centralized SIEM, or to bypass the local stack entirely), override OTEL_EXPORTER_OTLP_ENDPOINT in the CP container’s environment and ensure the receiver presents a server cert chained through the infra intermediate. Plaintext endpoints are rejected by design — see Trust Lane above.

Current Limitations

  • FNV-1a 32-bit identity — the BPF data path uses an FNV-1a hash of the lowercased domain as a fixed-width identity in dns_cache and route_map. Collision-vulnerable in theory; harmless in practice at deployment-typical rule-set sizes. The route-identity-allocator follow-up replaces this with userspace-allocated sequential u32 identities (Cilium pattern).
  • Prom counters not scraped — the clawker_netlogger_* counters are declared but not wired into a /metrics endpoint. The structured CP log surface is the operational signal for throughput, queue drops, parse errors, and OTLP export success/error. Kernel-side drop counters (events_drops, ratelimit_drops) live on the firewall/eBPF subsystem surface — they are subsystem health, not security telemetry, and intentionally don’t ride on the netlogger OTel stream.

See Also

  • Firewall — the source of every record on this stream (decision points, bypass mechanics, rule lifecycle).
  • Monitoring — the OpenSearch + OpenSearch Dashboards + Prometheus stack that hosts the clawker-ebpf-egress index.