From e3ac28b9f829c54f88db20245ad58b7d64629d19 Mon Sep 17 00:00:00 2001 From: bndw Date: Sun, 8 Mar 2026 21:54:30 -0700 Subject: initial: Axon protocol spec and README --- PROTOCOL.md | 355 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 29 +++++ 2 files changed, 384 insertions(+) create mode 100644 PROTOCOL.md create mode 100644 README.md diff --git a/PROTOCOL.md b/PROTOCOL.md new file mode 100644 index 0000000..145c933 --- /dev/null +++ b/PROTOCOL.md @@ -0,0 +1,355 @@ +# Axon Protocol — High Level Design + +> A Nostr-inspired event relay protocol for AI agent infrastructure. Retains the core architectural insight — signed events, relay as message bus, filtered subscriptions — while cleaning up the crypto, encoding, and type system. + +--- + +## Core Insight + +The relay is **Kafka at the edge, plus identity**. It is a log, not a database. It routes signed events between clients and stores them for replay. It is structurally incapable of understanding content it was not designed to index. + +--- + +## Architecture + +``` +[client A] ──publish──▶ [relay] ──fanout──▶ [client B] + │ + [index] ← id, pubkey, kind, created_at, tags + [store] ← raw msgpack bytes (opaque) +``` + +Consumers (agents, indexers, report jobs) subscribe to the relay and maintain their own materialized views. The relay never aggregates, summarizes, or transforms content. Derived data is always downstream. + +--- + +## Event Structure + +``` +Event { + id bytes // 32 bytes, SHA256 of canonical signing payload + pubkey bytes // 32 bytes, Ed25519 public key + created_at int64 // unix timestamp + kind uint16 // see Event Kinds registry + content bytes // opaque to the relay; msgpack bin type, no UTF-8 assumption + sig bytes // 64 bytes, Ed25519 signature over id + tags []Tag +} + +Tag { + name string + values []string +} +``` + +### Signing + +The event ID is the `SHA256` of a canonical byte payload. All integers are big-endian. All strings are UTF-8. `||` denotes concatenation. + +``` +id = SHA256(canonical_payload) +sig = ed25519.Sign(privkey, id) +``` + +**canonical_payload:** + +| Field | Encoding | +|---|---| +| pubkey | `uint16(32)` \|\| 32 bytes | +| created_at | `uint64` | +| kind | `uint16` | +| content | `uint32(len)` \|\| UTF-8 bytes | +| tags | see below | + +**canonical_tags:** + +Tags are sorted by `name` lexicographically (byte order). For ties on `name`, sort by first value lexicographically. Two tags sharing the same `name` and same first value is a **protocol error** — the relay must reject the event with `400`. Tags are effectively keyed on (name, first value); duplicates are a bug or an attack. + +``` +uint16(num_tags) +for each tag (in sorted order): + uint16(len(name)) || utf8(name) + uint16(num_values) + for each value: + uint32(len(value)) || utf8(value) +``` + +The `tags` field in `canonical_payload` is `SHA256(canonical_tags)` — a fixed 32-byte commitment regardless of tag count. Implementations may cache this hash to avoid re-sorting on repeated signature verification. + +**Full canonical_payload byte layout:** + +``` +[0:2] uint16 = 32 pubkey length — always 32 for Ed25519; validate and reject if not 32; + reserved for future key types +[2:34] bytes pubkey +[34:42] uint64 created_at +[42:44] uint16 kind +[44:48] uint32 content length — wire format supports up to ~4GB but relay enforces + a maximum of 65536 bytes (64KB); larger events are rejected with 413 +[48:48+n] bytes content (n bytes, n ≤ 65536) +[48+n:80+n] bytes SHA256(canonical_tags), 32 bytes +``` + +Two implementations that agree on this layout will always produce the same `id` for the same event. + +--- + +## Crypto + +| Purpose | Algorithm | Go package | +|---|---|---| +| Signing | Ed25519 | `crypto/ed25519` (stdlib) | +| Key exchange | X25519 | `golang.org/x/crypto/curve25519` | +| Encryption | ChaCha20-Poly1305 | `golang.org/x/crypto/chacha20poly1305` | +| Hashing / event ID | SHA-256 | `crypto/sha256` (stdlib) | + +All dependencies are from the Go standard library or `golang.org/x/crypto`. No third-party crypto. Ed25519 keys are converted to X25519 for ECDH — one keypair serves both signing and encryption. ChaCha20-Poly1305 provides authenticated encryption (AEAD); the ciphertext cannot be tampered with without detection. + +--- + +## Wire Format + +**Transport:** WebSocket (binary frames) +**Serialization:** MessagePack + +MessagePack is binary JSON — identical data model, no schema, no codegen. Binary fields (`id`, `pubkey`, `sig`) are raw bytes on the wire, eliminating base64 encoding and simplifying the signing story. + +### Connection Authentication + +Authentication happens immediately on connect before any other messages are accepted. + +``` +relay → Challenge { nonce: bytes } // 32 random bytes +client → Auth { pubkey: bytes, sig: bytes } +relay → Ok { message: string } // or Error then close +``` + +The client signs over `nonce || relay_url` to prevent replay to a different relay: + +``` +sig = ed25519.Sign(privkey, SHA256(nonce || utf8(relay_url))) +``` + +The relay verifies the signature then checks the pubkey against its allowlist. Failures return `Error { code: 401 }` and close the connection. + +**Allowlist:** the relay maintains a set of authorized pubkeys in config or the local database. Publish and subscribe are both gated on allowlist membership. Adding a user means adding their pubkey — no passwords, no tokens, no certificate infrastructure. + +### Client → Relay + +``` +Auth { pubkey: bytes, sig: bytes } +Subscribe { sub_id: string, filter: Filter } +Unsubscribe { sub_id: string } +Publish { event: Event } +``` + +### Relay → Client + +``` +Challenge { nonce: bytes } +EventEnvelope { sub_id: string, event: Event } +Eose { sub_id: string } +Ok { message: string } +Error { code: uint16, message: string } +``` + +Each message is a msgpack array: `[message_type, payload]` where `message_type` is a uint16. + +### Error Codes + +HTTP status codes, reused for familiarity. + +| Code | Meaning | +|---|---| +| 400 | Bad request (malformed message, invalid signature) | +| 401 | Not authenticated | +| 403 | Not authorized (pubkey not in allowlist) | +| 409 | Duplicate event | +| 413 | Message too large | + +The relay sends `Error` and keeps the connection open for recoverable conditions (e.g. a bad publish). For unrecoverable conditions (e.g. auth failure) it sends `Error` then closes. + +### Keepalive + +The relay sends a WebSocket ping every **30 seconds**. Clients must respond with a pong. Connections that miss two consecutive pings (60 seconds) are closed. Clients may also send pings; the relay will pong. + +--- + +## Filters + +``` +Filter { + ids []bytes // match by event id + authors []bytes // match by pubkey + kinds []uint16 // match by event kind + since int64 + until int64 + limit int32 + tags []TagFilter +} + +TagFilter { + name string + values []string // match any +} +``` + +--- + +## Relay Internals + +The relay unmarshals only what it needs for indexing and routing. `content` is never parsed — it is opaque bytes as far as the relay is concerned. + +**On ingest:** +1. Unmarshal the event envelope to extract index fields (`id`, `pubkey`, `kind`, `created_at`, `tags`) +2. Verify signature: recompute `id`, check `ed25519.Verify(pubkey, id, sig)` +3. Reject if `id` already exists — `id PRIMARY KEY` makes duplicate events impossible to store, and the fanout path checks an in-memory seen set before forwarding +4. Write index fields to the index tables +5. Write the verbatim msgpack envelope bytes to `envelope_bytes` — the entire event exactly as received, not re-serialized +6. Fanout to matching subscribers + +**On query/fanout:** +- Read `envelope_bytes` from store +- Forward directly to subscribers — no unmarshal, no remarshal + +**Index schema (SQLite or Postgres):** + +```sql +CREATE TABLE events ( + id BLOB PRIMARY KEY, + pubkey BLOB NOT NULL, + created_at INTEGER NOT NULL, + kind INTEGER NOT NULL, + envelope_bytes BLOB NOT NULL -- verbatim msgpack bytes of the full event, including content +); + +CREATE TABLE tags ( + event_id BLOB REFERENCES events(id), + name TEXT NOT NULL, + value TEXT NOT NULL +); + +CREATE INDEX ON events(pubkey); +CREATE INDEX ON events(kind); +CREATE INDEX ON events(created_at); +CREATE INDEX ON tags(name, value); +``` + +--- + +## Event Kinds + +Integer kinds with named constants. The integer is the wire format; the name is what appears in code and logs. Ranges enable efficient category queries without enumerating individual kinds. + +### Range Allocation + +| Range | Category | +|---|---| +| 0000 – 0999 | Identity & meta | +| 1000 – 1999 | Messaging | +| 2000 – 2999 | Encrypted messaging | +| 3000 – 3999 | Presence & ephemeral | +| 4000 – 4999 | Reserved | +| 5000 – 5999 | Job requests | +| 6000 – 6999 | Job results | +| 7000 – 7999 | Job feedback | +| 8000 – 8999 | System / relay | +| 9000 – 9999 | Reserved | + +### Defined Kinds + +| Constant | Kind | Description | +|---|---|---| +| `KindProfile` | 0 | Identity metadata | +| `KindMessage` | 1000 | Plain text note | +| `KindDM` | 2000 | Encrypted direct message | +| `KindProgress` | 3000 | Ephemeral progress/status indicator (thinking, agent steps, job status) | +| `KindJobRequest` | 5000 | Request for agent work | +| `KindJobFeedback` | 7000 | In-progress status / error | +| `KindJobResult` | 6000 | Completed job output | + +### Range Queries + +```sql +-- all job-related events +WHERE kind >= 5000 AND kind < 8000 + +-- ephemeral events (relay does not persist) +WHERE kind >= 3000 AND kind < 4000 +``` + +Ephemeral events (kind 3000–3999) are fanned out to subscribers but never written to the store. + +--- + +## Threading + +Conversations use explicit `e` tags with mandatory role markers: + +``` +Tag{ name: "e", values: ["", "root"] } +Tag{ name: "e", values: ["", "reply"] } +``` + +Root marker is required on all replies. No fallback heuristics. + +--- + +## Direct Messages + +`KindDM` (2000) events carry ChaCha20-Poly1305 encrypted content. The recipient is identified by a `p` tag carrying their pubkey: + +``` +Tag{ name: "p", values: [""] } +``` + +The relay indexes the `p` tag to route DMs to the recipient's subscription. Content is opaque; the relay cannot decrypt it. + +--- + +## Job Protocol + +Any client can publish a `KindJobRequest`; any agent subscribed to the relay can fulfill it. The flow: + +``` +KindJobRequest (5000) → { kind: 5000, content: "", tags: [["t", ""]] } +KindJobFeedback (7000) → { kind: 7000, content: "", tags: [["e", ""]] } +KindJobResult (6000) → { kind: 6000, content: "", tags: [["e", ""]] } +``` + +Multiple agents can compete to fulfill the same request. The requester can target a specific agent with a `p` tag. + +**Expiry:** job requests may include an `expires_at` tag carrying a unix timestamp. Agents must check this before starting work and skip expired requests. The relay does not enforce expiry — it is agent-side policy. + +``` +Tag{ name: "expires_at", values: [""] } +``` + +--- + +## Consumers + +The relay is the log. Anything requiring derived data subscribes and maintains its own view: + +- **Search indexer** — subscribes to all events, feeds full-text index +- **Daily report** — subscribes to past 24h, generates summary via agent +- **Metrics collector** — counts event types, feeds dashboard +- **Conversation summarizer** — subscribes to completed threads + +Each consumer is independent and can rebuild from relay replay on restart. + +**Resumption:** consumers track their own position by storing the `created_at` of the last processed event and resuming with a `since` filter on restart. Use event `id` to deduplicate any overlap at the boundary. + +--- + +## Threat Model + +**DM metadata:** `KindDM` content is encrypted and opaque to the relay, but sender pubkey and recipient `p` tag are stored in plaintext. The relay operator can see who is talking to whom and when. Content is private; the social graph is not. + +--- + +## What This Is Not + +- Not a database. Don't query it like one. +- Not a general message queue. It has no consumer groups or offset tracking — consumers manage their own position. +- Not decentralized. Single relay, single operator. Multi-relay federation is out of scope. diff --git a/README.md b/README.md new file mode 100644 index 0000000..6f6a363 --- /dev/null +++ b/README.md @@ -0,0 +1,29 @@ +# Axon + +A signed event relay protocol for AI agent infrastructure. + +Axon is the transport and identity layer for systems where agents, humans, and automated jobs need to communicate over a shared bus. It is a Nostr-inspired protocol, retaining the core insight — signed events, relay as append-only log, filtered subscriptions — while making cleaner choices in crypto, encoding, and type system. + +## Design Principles + +- **The relay is a log, not a database.** It routes and stores signed events. Derived data lives downstream in consumers. +- **Identity is a keypair.** Ed25519 public keys are the unit of identity. No passwords, no tokens, no certificate infrastructure. +- **Content is opaque.** The relay indexes what it needs for routing and stores the rest as raw bytes. It cannot read what it was not designed to index. +- **Kafka at the edge, plus identity.** Filtered subscriptions over WebSocket give browsers and agents direct access to the event stream without a gateway layer. + +## Protocol + +See [PROTOCOL.md](PROTOCOL.md) for the full specification, including: + +- Event structure and canonical signing payload +- Crypto stack (Ed25519, X25519, ChaCha20-Poly1305) +- Wire format (MessagePack over WebSocket) +- Connection authentication +- Event kind registry and range allocation +- Job protocol for agentic workloads +- Relay internals and index schema +- Threat model + +## Status + +Protocol design. No implementation yet. -- cgit v1.2.3