Architecture — Roxabi Factory

01

Hexagonal Core

The engine is built on Hexagonal Architecture (Cockburn — Ports & Adapters). The domain core is a hexagon: it defines its needs as ports (Python Protocol interfaces) and remains completely ignorant of how those needs are fulfilled. Every external concern — a messaging platform, a language model, a database — is a concrete adapter that implements a port.

The consequence is that the core never imports aiogram, discord.py, anthropic, or any I/O library. It is testable in isolation, swappable at any boundary, and immune to framework churn. A new LLM provider requires only a new adapter; the core is untouched.

The hexagonal core defines ports (interfaces) and knows nothing of their implementations. Inbound adapters normalize platform events into domain types; outbound adapters implement domain ports against specific technologies.

Three layers refine the pattern. Clean Architecture keeps dependencies pointing inward: domain at the centre, application use-cases around it, infrastructure and adapters at the outer rings. Hexagonal Architecture translates this into the port/adapter shape above. The Kernel extension goes further: the innermost layer is pure — no framework imports, no I/O, no mutable global state — so it is testable in complete isolation and replaceable without migration cost.

One Composition Root wires concrete implementations to ports. That is the only site in the codebase where infrastructure concretions are selected — everywhere else talks to protocols.

02

Hub-and-Spoke Topology

The engine runs as several independent processes connected by the message bus. One process is the hub — the single routing authority. All other processes are spokes: platform adapters (Telegram, Discord, CLI) and capability workers, each connected to the hub over NATS.

Inbound messages arrive at an adapter spoke, are normalized into a domain type, and published to the bus. The hub receives them, resolves which agent and pool should handle the conversation, and dispatches the turn. Responses stream back over the bus to the originating adapter spoke, which delivers them to the platform.

The hub is the single routing authority. Adapter spokes publish inbound messages and receive outbound responses over the bus. The agent pool (CliPool) receives dispatched turns and streams results back through the hub.

The hub never dies from a missing spoke. Adapter lookup failures are caught by the middleware pipeline; every pool is bounded to 100 queued items; and the hub is the sole creator of the shared readiness KV bucket — adapters and workers wait for it, never race to provision it.

One hub, many spokes. Independent processes — start, restart, or add spokes without touching the hub.
Scope isolation. Each conversation scope produces an independent pool. Two chats from the same user are two pools, two independent contexts.
Trust is Hub-side. Adapter spokes send trust=PUBLIC. Trust resolution is Hub-only — adapters never decide who may speak.

03

The NATS Message Bus

All inter-process communication runs over NATS. Every subject follows the convention factory.{domain}.{qualifier…} — domain first, qualifying tokens after. The subject tree is structured, not flat, so consumers can subscribe to a plane with a single wildcard.

Three planes organise the bus. Each plane has a fixed NATS type, a durability contract, and a keying shape:

Plane	Subject prefix	Type	Durability	When to use
Messages	`factory.{inbound,outbound}.<platform>.<bot_id>`	Core	Ephemeral	Bidirectional hub ↔ adapter routing of user content
Persistence	`factory.turns.>`	JetStream durable	At-least-once	Append-only state changes (conversation turns)
Typing / Lifecycle	`factory.typing.<platform>.<bot_id>`	Core	Ephemeral (lossy-OK)	Display-feedback events — typing indicators, progress signals

The rule for adding a subject: decide durability first. Hard guarantee needed → JetStream → Persistence plane. Lossy-OK → Core → Messages or Typing depending on direction. A new plane requires a separate architecture decision — and a justification that durability, direction, and lifecycle ownership all differ from the three existing planes.

The bus is a contract, not a pipe. Subject tokens must never contain dots; bot_id is validated at startup. A shadow subject that bypasses per-bot ACL rules is a startup error, not a runtime surprise.

04

Typed Contracts

Every message that crosses a process boundary is a typed schema. Two shared packages own this contract surface — one for transport primitives, one for domain schemas.

roxabi-nats — the transport SDK. Provides NatsAdapterBase, the connect helper, circuit-breaker, and serialization utilities. Pure transport — zero knowledge of subject names or domain semantics. Consumed by satellites via a pinned version tag.
roxabi-contracts — the shared schema package. Ships Pydantic models, subject string constants, and test doubles for every cross-process domain. Satellites import the same typed models the hub publishes against — drift between publisher and subscriber becomes a type error at import time, not a silent wire mismatch at runtime.

The streaming pipeline follows a two-stage event model. The LLM adapter emits LlmEvent (text deltas, tool calls, result). The StreamProcessor in the domain core transforms these into RenderEvent (text delta, tool summary) — a channel-agnostic representation of what should appear to the user. Each outbound adapter receives a RenderEvent stream and renders it in the platform's native way.

The StreamProcessor is the boundary between the LLM layer and the delivery layer. It is channel-agnostic and testable in isolation — no platform SDK in scope. Outbound adapters render the same RenderEvent stream each in their platform's native way.

Every envelope that crosses the hub–adapter boundary carries a schema_version field. Receivers accept versions up to their expected maximum and drop strictly-greater versions with an ERROR log. A schema bump requires a coordinated deploy — no rolling migration without a version gate.

05

Agent Pools

The hub never creates agents dynamically. Agents are immutable singletons defined at boot from TOML seed files: a model, a system prompt, a set of tools, a namespace. All mutable state lives in the pool, not the agent.

A pool is created on the first message that matches a routing key — a three-field tuple of (platform, bot_id, scope_id) that uniquely identifies a conversation scope. Two chats from the same user produce two independent routing keys and two independent pools. User identity for rate limiting is tracked separately and never collapsed into scope_id.

Agent — the AI brain. Immutable config: model, system prompt, permitted tools, plugins, namespace.
Binding — the mapping from a (bot_id) to an (agent_name) for a given conversation scope. Managed by the hub.
Pool — the live conversation context for one routing key. Holds the session, the compaction state, the memory scope. Evicted by LRU when the pool limit is reached.

Pool creation is serialised under a lock that wraps an explicit existence check plus LRU eviction. Inline pool-ID construction is forbidden — pool IDs are always derived via RoutingKey.to_pool_id(), producing a stable platform:bot_id:scope_id string that matches the per-bot ACL rules.

Agents are configuration, not code. The engine dispatches turns; it does not decide what the agent knows or how it reasons. Those choices live in a TOML file, independently of the engine version.

06

The Worker Fleet

Capability computation is delegated to a fleet of workers. A worker is a compute instance that consumes tools to run a job — it is not a tool, not a provider, and not the agent. It runs a workerEngine: a coded, deterministic pipeline that calls three kinds of step.

The harness — one pure agentic turn where the model decides and its in-turn tools fire. Agency where you want a decision.
Tool calls — direct invocations of the tool plane: built-in primitives, remote satellites, skills, sub-agents.
Internal code — deterministic transformations with no model in the loop. Guarantees where you want a guarantee.

Workers are discovered by heartbeat. Each satellite announces itself on its heartbeat subject (factory.voice.tts.heartbeat, factory.image.heartbeat, …). The WorkerRegistry scores available workers and routes requests to the best candidate. On timeout or no-responders, the registry marks the worker stale and walks to the next candidate. Workers are automatically re-admitted on their next heartbeat.

The workerEngine is a deterministic pipeline. It calls three kinds of step — the harness (model turn), tool calls, and internal code — composing agency with guarantee. The WorkerRegistry routes requests to available workers by score, with automatic stale-detection.

The transport layer underneath workers follows a three-layer composition: NatsTransport owns all NATS I/O; WorkerPoolClient adds routing, circuit-breaker, and heartbeat subscription; domain clients wrap both with a typed codec. All transport-level methods return a Result type — no exceptions cross the transport boundary, and no internal detail leaks to users or logs.

The harness is a callee of the engine, not a flavour of it. Agency where you want a decision; code where you want a guarantee. The line between them is explicit, not emergent.

How the Factory is Built

Hexagonal Core

Hub-and-Spoke Topology

The NATS Message Bus

Typed Contracts

Agent Pools

The Worker Fleet