AgentBus: Building a Message Bus for Heterogeneous Agents

Why we built a NATS-backed message bus for cross-machine agent coordination alongside A2A, and what the design ended up looking like.

When Google and the Linux Foundation shipped the A2A protocol earlier this year, the obvious question was: do we need to build anything ourselves?

We did — but not because A2A is wrong. A2A is the right answer for how agents present themselves to the outside world and interop with agents they don’t control. The gap we were trying to fill is different: how agents on your own infrastructure actually coordinate at runtime.

The analogy that clarified it

A web application speaks HTTP externally. Internally, it uses message queues, database connections, and in-process function calls — not HTTP. HTTP is the boundary protocol, not the operational fabric.

A2A is the same thing for agents. You still need somewhere for the agents to live.

What we built

AgentBus is that place. NATS internally for speed and real-time messaging. A2A-compatible externally so any A2A client can call your agents, and your agents can call any A2A endpoint.

                        ┌─ NATS (internal, fast) ──┐
Machine A (sender)  ────┤                           ├──── Machine B (listener)
  agentbus send         └─ A2A HTTP (external) ─────┘       agentbus listen

Why NATS?

A2A over HTTP has a latency and ergonomics problem for real-time coordination.

A2A’s streaming story is Server-Sent Events — one direction, HTTP-based, requires an HTTP server on the listener side. For an agent running on a laptop or a VM that spins up on demand, standing up an HTTP server with a stable public endpoint is operational overhead that adds nothing.

NATS gives you:

Sub-millisecond pub/sub with wildcard subjects (agent.*.inbox, event.result.>)
Bidirectional by default — both sides can publish and subscribe
Request/reply built in (correlation IDs, reply subjects, timeouts)
No server required on the agent side — agents connect out to NATS, not the reverse

So sending a task from a laptop to a VM agent looks like this:

agentbus send --to vm-agent --command "pytest tests/ -x"
# ← stdout arrives in real-time
# all 47 tests passed in 3.2s

No HTTP server running on the VM. No ngrok. No port forwarding.

The Guardian layer

Before any message reaches a handler, it passes through Guardian — the security and validation layer.

Three things Guardian does:

Schema validation. Every message conforms to the AgentBus envelope schema: from, to, message_type, correlation_id, payload. Messages that don’t parse get dropped.

ACLs. Agent cards define which agents can send to which subjects. A VM agent that should only accept commands from trusted senders can enforce that at the message layer before any code runs.

Prompt injection detection. When you have agents dispatching messages to Claude, an adversary who can influence the message content can inject instructions. Guardian runs basic injection detection on incoming message text before it reaches the Claude adapter. None of this is foolproof, but it raises the floor over “accept everything and hope the LLM notices.”

The Claude adapter

The listener has three dispatch modes:

Mode	How
Shell	`bash -c "<command>"` — Claude not involved
One-shot	`claude -p "<message>"` — stateless
Session resume	`claude -p --resume <session-id>` — stateful

Session resume is the interesting one. Sessions are keyed by {sender}:{repo}, so when the laptop agent sends a follow-up message about the same codebase, the VM agent automatically resumes the prior Claude session. You get persistent context without managing conversation state yourself.

Sessions expire after an hour of idle time — enough for a working session, short enough that they don’t accumulate indefinitely.

Where it fits

vs. Discord / chat channels: The original coordination approach had agents posting to a shared channel and reading each other’s messages. The problem is that reading a channel means loading message history into an LLM context window — every coordination message costs tokens, and history grows. For anything beyond one-off exchanges, the token bill compounds fast. NATS messages are raw bytes; no LLM context involved in routing. Coordination cost drops to near zero.

vs. A2A: A2A is the standard. AgentBus’s NATS transport is faster for real-time messaging, but A2A has industry backing and will become the interop standard. The right long-term answer is to speak both: NATS internally, A2A HTTP at the boundary. That’s the direction the project is heading.

vs. MCP: Different layers. MCP gives Claude access to tools and resources. AgentBus lets multiple agents — Claude or otherwise — coordinate with each other. They’re complementary: the VM agent uses MCP tools within its Claude session, and AgentBus for messaging between machines.

vs. Claude Agent Teams: Agent Teams is Anthropic’s native multi-agent story and will eventually be the right answer when it supports remote teammates. Today it’s same-machine, Claude-only. AgentBus fills the gap for cross-machine coordination and heterogeneous agents.

vs. AutoGen/CrewAI/LangGraph: These are frameworks for defining agent behavior. AgentBus is transport infrastructure. They don’t compete.

The `--on-message` hook

One design choice: any process can be an agent.

agentbus listen --agent my-agent \
  --on-message './handle.sh'

The hook receives JSON on stdin. Its stdout becomes the reply. That’s the contract.

A shell script, a Python script, a compiled binary, a curl call to an external API — the bus doesn’t care what’s on the other end as long as it can read stdin and write stdout. “Heterogeneous” in a real sense, not just “any OpenAI-compatible model.”

Where this fits in Anthropic’s pattern catalog

Anthropic recently published five multi-agent coordination patterns — Generator-Verifier, Orchestrator-Subagent, Agent Teams, Message Bus, and Shared State.

AgentBus implements Pattern 4: Message Bus. The other four patterns can run on top of it — an orchestrator that dispatches subtasks via NATS subjects, or agent teams that share a subject namespace for coordination. The bus is infrastructure, not a pattern choice.

What the Anthropic catalog doesn’t cover is the cross-machine and heterogeneous-process dimensions: all five patterns assume agents that can talk to each other in-process or via HTTP. AgentBus’s contribution is making those patterns work across machines and across process types (shell scripts, Python daemons, Claude sessions, curl calls) without any of them needing to know about each other’s internals.

What’s next

The clearest gap is the A2A gateway — a bridge that lets external A2A clients talk to agents on the bus without knowing about NATS, and lets bus agents call external A2A endpoints without writing HTTP clients.

Beyond that: better agent discovery (right now it’s JSON files on disk, which works for a small fleet), and the Level 3 autonomous mode for the Claude adapter — multi-round execution loops with checkpoints and rollback.

The repo is at github.com/lockezhou18/agentbus-v2 if you want to run it or look at the internals.

bash scripts/setup.sh --name my-agent --listen