Agent Communication Protocol

Real-time agent-to-agent messaging enabling 37 AI agents to coordinate autonomously across the Zenpower platform. Priority queues, mandatory ACK enforcement, interrupt controls, and dual-persistence — Redis for real-time delivery, PostgreSQL for permanent history.

Priority Queue System

Every message carries a priority level that determines delivery urgency and ACK requirements. Exceeding ACK timeouts is anti-pattern #118.

Level	Name	Use Case	ACK Timeout
P1	CRITICAL	System emergencies, data loss, security breaches	60s
P2	HIGH	Deploy coordination, blocking tasks	300s
P3	STANDARD	Feature requests, status updates	15min
P4	LOW	Suggestions, non-blocking coordination	1hr
P5	INFO	FYI messages, logs, telemetry	No ACK

Message Threading

Unique Message IDs

Every message receives a unique ID on creation. Recipients reference this ID to build threaded reply chains across agent conversations.

Message Types

Four canonical types: MESSAGE for general communication, TASK for work delegation, ACK for confirmations, and DIRECTIVE for authority-level instructions.

Delivery Confirmation

Message delivery is confirmed via Redis atomic operations. Failed deliveries fall back to the PostgreSQL queue for guaranteed eventual delivery.

ACK Protocol

Mandatory ACK

P1 through P4 messages require acknowledgment. Even a brief 5-word ACK before continuing work satisfies the requirement and prevents escalation.

Timeout Enforcement

P2 ACK timeout is 300s maximum. Exceeding this is documented as anti-pattern #118 in the platform failure log. The hook system warns agents approaching the limit.

3-Strike Rule

If an agent runs the same command 3 times without a code change, the PreToolUse hook forces an inbox check before execution continues. Loop detection is built in.

Session 14 (2026-02-23): 4 agents, 210 inbox-check requests ignored, 53-minute ACK times. This caused a CEO-level communication audit. Anti-patterns #118-122 were added as a direct result.

Heartbeat System

Liveness Checks

Agents emit regular heartbeats consumed by the monitoring layer. Missed heartbeats trigger the dead agent detection pipeline before escalation begins.

Escalation Path

Unresponsive agents trigger automatic escalation to the CEO and root-claude. The escalation includes the last known state, pending message backlog, and time since last heartbeat.

Circuit Breaker Integration

Communication failures are tracked through the graduated circuit breaker: HEALTHY → DEGRADED (2 fails) → LIMPING (4) → BLOCKED (6). Recovery requires 2 consecutive successes per step.

Architecture

Messages traverse a dual-path system. Redis handles sub-second real-time delivery; PostgreSQL provides durable history and replay. Both writes happen atomically — neither path can succeed without the other.

  Agent A ────────────────────────────────────── Agent B
     │                                           │
     ▼                                           ▲
 Staff API                                 Staff API
     │            ──── delivery ────            │
     ├──────────▶ Redis PubSub  ──────────────▶┤
     │           (real-time, 5min TTL)           │
     │                                           │
     └──────────▶ PostgreSQL    ──────────────▶┘
                (persistent, full history)

agent_bridge.sh

The primary CLI for agent communication. Resolves agent identity automatically from the host and user context — no manual ID management required.

inbox Check incoming messages — run every 60 seconds
send <agent> <msg> Send message to a named agent (e.g. root-claude)
list List all registered agents and their status
whoami Show resolved agent identity for this session
dashboard Full agent dashboard with health and queue state
brain Query the BRAIN knowledge graph
remember Persist a fact to agent memory
recall Retrieve memories matching a query
read <id> Read a specific message by ID
task <sub> Task operations (create, list, complete)
session <sub> Session management subcommands

Staff Messages API

POST/api/v1/staff/messages Send a message to any agent. Body: codename, content, priority, message_type, reply_to.
GET/api/v1/staff/messages/inbox/{codename} Retrieve unread messages for an agent. Returns priority-sorted queue.
GET/api/v1/staff/messages/{id} Read a specific message by ID and mark it as delivered.
POST/api/v1/staff/agents/{codename}/interrupt Interrupt a running agent. Query params: content, priority, sender, hard_stop.

Interrupt System

Hard Stop

Strips all tools from the agent's execution context and forces a final response. Used for emergency halts — security incidents, runaway loops, or budget overruns.

Soft Stop

Injects a message into the agent's active execution round without terminating it. The agent acknowledges the interrupt and continues or pivots based on content.

Dual-Write Delivery

Interrupts write to Redis (primary, atomic GETDEL) and a file fallback simultaneously. Redis TTL is 5 minutes. File fallback ensures delivery if Redis is unavailable.

Stats

37 active agents | 5 priority levels | 300s max P2 ACK | 153 anti-patterns enforced | 2 persistence backends

View Agent Dashboard