Protocol Overview
The Diminuendo wire protocol defines a structured, bidirectional communication layer between frontend clients and the gateway. It is designed for real-time streaming of AI agent events — thinking blocks, tool calls, terminal output, file mutations — while maintaining the strict ordering and persistence guarantees required for reliable session replay.Transport and Encoding
The protocol operates exclusively over WebSocket connections (RFC 6455). Every frame is a UTF-8-encoded JSON object. Binary frames are not used. Compression (perMessageDeflate) is disabled to minimize latency on the hot path — text deltas arrive at sub-millisecond intervals during active turns, and decompression overhead is unacceptable at that frequency.
The gateway supports protocol versions 1 and 2. The
welcome message includes a protocolVersion field. Clients can negotiate their preferred version — the gateway selects the highest mutually supported version. Version negotiation is implemented in src/protocol/versioning.ts.Protocol Version
The gateway currently supports protocol versions1 and 2. The version is transmitted in the initial welcome event and is available as a constant in every SDK. Additive changes — new event types, new optional fields — do not increment the version. Version increments are reserved for breaking wire format changes.
Connection Lifecycle
Every WebSocket connection progresses through a deterministic sequence of phases:Connect
The client opens a WebSocket connection to
ws(s)://host:port/ws. The gateway validates the Origin header against its allowlist (bypassed in dev mode) and performs a CSRF check for browser-origin connections.Welcome
The gateway immediately sends a
welcome event containing the protocol version and whether authentication is required. A connected event follows with the assigned clientId and the heartbeat interval.Authenticate
If
requiresAuth is true, the client must send an authenticate message with a valid JWT or API key. The gateway verifies the token (via Auth0 JWKS in production) and responds with authenticated containing the user’s identity. In dev mode, authentication is automatic — the gateway assigns a synthetic identity (developer@example.com) and sends authenticated without requiring a token.Session Interaction
After authentication, the client may send any of the 49 message types: listing sessions, creating sessions, joining sessions, running turns, managing automations, and so on. Messages sent before authentication (except
authenticate itself) are rejected with a NOT_AUTHENTICATED error.Join Session
To receive streaming events for a session, the client sends
join_session. The gateway responds with a state_snapshot — a complete picture of the session’s current state — and subscribes the client to all future events for that session.Message Format
Every message — both client-to-server and server-to-client — is a JSON object with atype field that serves as the discriminator:
type field is always a string. Client messages use 49 distinct types; server events use 100+. The gateway’s Effect Schema parser validates every inbound message against the full union of client message schemas and rejects anything that does not match with an INVALID_MESSAGE error.
Sequence Numbers
Events that belong to a session carry aseq field — a per-session, monotonically increasing integer. Sequence numbers serve three purposes:
- Ordering — clients can sort events by
seqto reconstruct the correct order, even if WebSocket frames arrive out of order due to network conditions - Deduplication — replayed events carry the same
seqas the original; clients can skip events they have already processed - Resumption — when reconnecting, clients pass
afterSeqin thejoin_sessionmessage to receive only events they missed
seq: 1.
Timestamps
Events include ats field containing the Unix epoch time in milliseconds at which the gateway generated (or relayed) the event. Timestamps are server-authoritative — clients should not rely on their own wall clock for ordering.
The ping/pong mechanism exposes both clientTs (echoed back from the client’s ping) and serverTs (the gateway’s timestamp at pong time), enabling clients to compute round-trip latency and approximate clock skew.
Event Classification
Server events fall into two persistence categories, which determine whether they survive a gateway restart and are available for replay:Persistent Events
Stored in the session’s SQLite database. Available for replay via
get_events and join_session with afterSeq. Examples: turn_started, turn_complete, tool_call, tool_result, question_requested, session_state, sandbox_ready, sandbox_removed.Ephemeral Events
Broadcast to currently-connected subscribers only. Not stored. Lost if no client is listening. Examples:
text_delta, thinking.progress, terminal.stream, heartbeat, usage.update.turn_started, turn_complete) capture the final, authoritative state.
Heartbeat
The gateway sends aheartbeat event every 30 seconds to all session topics with active subscribers. The heartbeat contains only a ts field:
connected event via the heartbeatIntervalMs field, so clients need not hardcode the value.
Reconnection
The protocol is designed for graceful recovery from disconnections. The reconnection procedure is:Detect Disconnection
Either the WebSocket
close event fires, or the heartbeat timeout expires. SDKs with autoReconnect enabled handle this automatically.Re-establish Connection
Open a new WebSocket, authenticate, and receive the
welcome / connected / authenticated sequence as normal.Rejoin with afterSeq
Send
join_session with the afterSeq field set to the last seq received before disconnection. The gateway replays all persistent events after that sequence number.Handle Gap Events
If the gateway detects that some events between the client’s
afterSeq and the current head are missing (e.g., because ephemeral events were not persisted), it sends a gap event indicating the range of missing sequence numbers.Rate Limiting
The gateway enforces a per-connection rate limit of 60 messages per 10-second window using a sliding window counter. Messages that exceed this limit receive an immediate error response:What’s Next
Client Messages
Complete reference for all 49 client-to-server message types.
Server Events
Complete reference for all 100+ server-to-client event types.
Error Handling
Error codes, recovery strategies, and sanitization rules.