Technical Deep-Dive into Eragon's Architecture

Between March 28 and April 7, we shipped five releases across the OpenClaw open-source project that powers Eragon's agent runtime. This post is a technical walkthrough of everything that landed — the architectural decisions, the tradeoffs, and why it matters for the agents running your operations.

1. Memory Dreaming — How Your Agent Consolidates What It Knows

The most novel feature in the batch. A background memory consolidation system inspired by how human sleep consolidates short-term memories into long-term ones.

The problem it solves.

Agents accumulate context constantly — from conversations, tool calls, search results, documents. But not all of that context is equally valuable. Some things matter once and never again. Other things keep coming up across different tasks, different days, different queries. The challenge is figuring out which is which, automatically, without a human curating the agent's memory.

Memory Dreaming is a probabilistic system that promotes memories based on how consistently useful they've been across diverse contexts over time. It's not summarization. It's not basic RAG retrieval. It's a multi-stage consolidation pipeline that mirrors how episodic memory works in biological systems.

How it works.

A unified cron sweep runs daily (default 3 AM) in three ordered phases: Light, REM, and Deep. The system hooks into the agent's heartbeat session via a before_agent_reply plugin hook.

Phase 1 — Light Sleep runs roughly every 6 hours. It reads short-term recall entries within a 2-day lookback window, deduplicates by Jaccard similarity, and writes a consolidated block into the current daily memory file. It also spawns a transient subagent to generate a poetic diary entry — a narrative reflection written by an LLM with a specific persona prompt. The subagent session is deleted after the entry is collected.

Phase 2 — REM Sleep runs weekly. This is where pattern detection happens. The system aggregates concept tags across recent memory entries, computes theme strength as count / entries.length * 2, and surfaces recurring themes above a 0.75 threshold. It then scores candidate entries using a 4-component confidence formula, deduplicates at 0.88 Jaccard similarity, and keeps the top 3 entries at confidence ≥ 0.45. The output is a set of reflections and "possible lasting truths."

Phase 3 — Deep Sleep runs daily. This is the promotion gate. It ranks all recall entries by a weighted composite promotion score and writes the winners to MEMORY.md — the durable long-term store.

The promotion score.

This is the core of the system. Every memory candidate is scored across seven weighted dimensions:

score = 0.24 × frequency
      + 0.30 × relevance
      + 0.15 × diversity
      + 0.15 × recency
      + 0.10 × consolidation
      + 0.06 × conceptual
      + phaseBoost
score = 0.24 × frequency
      + 0.30 × relevance
      + 0.15 × diversity
      + 0.15 × recency
      + 0.10 × consolidation
      + 0.06 × conceptual
      + phaseBoost
score = 0.24 × frequency
      + 0.30 × relevance
      + 0.15 × diversity
      + 0.15 × recency
      + 0.10 × consolidation
      + 0.06 × conceptual
      + phaseBoost

Frequency (0.24) measures how often the memory has been recalled, normalized as log1p(signalCount) / log1p(10). Relevance (0.30) is the average embedding similarity score across all recall events. Diversity (0.15) counts the number of unique queries and contexts that triggered the memory — a memory that's useful in many different situations scores higher than one that's useful in the same situation repeatedly. Recency (0.15) applies exponential decay with a configurable half-life (default 14 days). Consolidation (0.10) measures spread across calendar time. Conceptual (0.06) is a simple concept tag count normalized against a ceiling of 6.

The phaseBoost adds +0.05 from Light Sleep hits and +0.08 from REM Sleep hits, both on a 14-day half-life decay. This creates the multi-stage pipeline: memories that survive Light and REM get a boost going into Deep, mimicking biological consolidation.

All three gates must pass for promotion: composite score ≥ 0.80, recall count ≥ 3, unique queries ≥ 3.

Recall tracking under the hood.

Every time memory_search returns results, recordShortTermRecalls() fires immediately as a fire-and-forget operation. It upserts a ShortTermRecallEntry into a JSON file under filesystem advisory lock, tracking recall count, query hashes (SHA-1 prefix), days seen, embedding scores, and concept tags.

Aging is controlled by two parameters: recencyHalfLifeDays (default 14 — at 14 days recency = 0.5, at 28 days = 0.25) and maxAgeDays (default 30 — hard cutoff, entries older than this are excluded entirely).

Why this matters.

This isn't a feature you interact with directly. It's infrastructure that makes your agent quietly better over time. The agent remembers what consistently matters and lets go of what doesn't — without anyone maintaining it.

2. Task Flows — Durable Background Orchestration

The most architecturally significant change in the batch. Subagents, ACP tasks, cron jobs, and background CLI execution are now unified under a single SQLite-backed task ledger.

The problem it solves.

Before Task Flows, every background execution mechanism — subagents, ACP tasks, cron — was tracked independently. If you wanted to orchestrate a multi-step workflow (spawn agent A, wait for result, spawn agent B, handle failure, retry), you had to wire it together yourself. There was no unified view of what was running, no durable state, and no clean way to cancel a workflow mid-flight.

Two flow modes.

task_mirrored is automatic. Every new subagent or ACP task gets a flow wrapper created by ensureSingleTaskFlow(), with status synced in real time. If you're already using subagents or ACP, you get flow tracking for free.

managed is explicit. A plugin calls createManagedTaskFlow() with an owner key, a controller ID, and a goal. The controller ID is immutable and identifies the orchestration logic. The plugin then calls runTaskInFlow() to spawn child tasks and drives state transitions manually.

The state machine.

TaskFlow: queued running waiting running (resume)
                          ├→ blocked running (retry)
                          └→ succeeded | failed | cancelled | lost
TaskFlow: queued running waiting running (resume)
                          ├→ blocked running (retry)
                          └→ succeeded | failed | cancelled | lost
TaskFlow: queued running waiting running (resume)
                          ├→ blocked running (retry)
                          └→ succeeded | failed | cancelled | lost

The blocked state is distinct from failed. It's triggered when a child task ends with terminalOutcome: "blocked" — meaning authorization denied or a write-blocker, not an error. This surfaces permission failures at the flow level without conflating them with crashes or bugs.

Concurrency control.

The SQLite schema uses optimistic locking via a revision field. Every write requires an expectedRevision parameter; mismatches return { applied: false, reason: "revision_conflict" }. This prevents race conditions when multiple child tasks report back simultaneously.

Sticky cancel intent solves another race condition: once cancelRequestedAt is set on a flow, it cannot be unset. Any subsequent runTaskInFlow() call immediately returns { created: false }. This guarantees no child task can be spawned after cancellation is requested, even if the cancel and spawn messages cross in flight.

Why this matters.

Task Flows are the foundation for complex multi-step agent workflows. Any time your agent needs to do something that involves multiple stages, dependencies between tasks, or the possibility of failure and retry — this is the infrastructure that makes it reliable.

3. Video, Music Generation, and ComfyUI

Three new built-in tools added to the agent's capabilities: video_generate, music_generate, and a full ComfyUI plugin architecture.

Video generation.

The video_generate tool accepts a prompt, up to 5 reference images, up to 4 reference videos, and controls for size, aspect ratio, resolution, duration, and audio. The mode is auto-detected based on what you provide: text-to-video (no references), image-to-video (input images), or video-to-video (input videos).

Three providers are supported:

Runway gen4.5 (default) uses separate endpoints for each mode. Polling every 5 seconds through states: PENDING → RUNNING → THROTTLED → SUCCEEDED → FAILED → CANCELLED. Text-to-video is limited to 16:9 or 9:16. Max 10 seconds.

xAI grok-imagine-video uses an async poll pattern — POST to create, then poll every 5 seconds (120 max) through queued → processing → done. Max 15 seconds at 480p/720p.

Alibaba Wan runs through the DashScope API with async headers. Supports models across text-to-video, image-to-video, and reference-to-video. One constraint: reference inputs must be remote HTTPS URLs — no buffer upload. Max 10 seconds.

Music generation.

music_generate uses Google Lyria. It's a synchronous call (unlike video) that accepts a prompt, optional lyrics, an instrumental flag, up to 10 reference images for conditioning, and duration/format controls. The API returns both the generated audio and the generated lyrics as separate response parts.

ComfyUI: the model IS the workflow.

The ComfyUI plugin architecture unifies image, video, and music generation under a single execution model. All three providers use runComfyWorkflow() underneath.

The injection pipeline: load workflow JSON → inject prompt into the prompt node → upload reference images via multipart POST → inject filenames into image nodes → POST to /prompt → poll for completion → download outputs. It supports both local ComfyUI instances and Comfy Cloud, with all HTTP going through the SSRF guard.

directSend is an opt-in delivery path that skips the LLM entirely after generation completes. Instead of waking a subagent to deliver the media, it reads the requester's origin from the task handle and sends media URLs directly to the channel with a built-in idempotency key.

4. Prompt Caching Overhaul

A major focus on Anthropic KV-cache stability that translates directly to cost reduction for long sessions.

The boundary trick.

A single HTML comment — <!-- OPENCLAW_CACHE_BOUNDARY --> — is injected into the system prompt. Everything above it is static: identity, tool docs, skills, workspace files like AGENTS.md, SOUL.md, MEMORY.md. Everything below it is dynamic: runtime context and HEARTBEAT.md.

Why does HEARTBEAT.md live below the boundary? Because it changes on every heartbeat run. If it were above the boundary, every heartbeat update would invalidate the entire cache. The implementation is a single line:

const DYNAMIC_CONTEXT_FILE_BASENAMES = new Set(["heartbeat.md"]);
const DYNAMIC_CONTEXT_FILE_BASENAMES = new Set(["heartbeat.md"]);
const DYNAMIC_CONTEXT_FILE_BASENAMES = new Set(["heartbeat.md"]);

That one Set is the entire decision.

Cache stability.

Four normalizations prevent unnecessary cache invalidation: CRLF/CR converted to LF, trailing whitespace stripped per line, leading/trailing blank lines removed from sections, and capability IDs in the runtime line lowercased, deduplicated, and alphabetically sorted.

MCP tool ordering is also deterministic now:

buildToolDigest(names) {
  return hash(JSON.stringify([...names].toSorted()));
}
buildToolDigest(names) {
  return hash(JSON.stringify([...names].toSorted()));
}
buildToolDigest(names) {
  return hash(JSON.stringify([...names].toSorted()));
}

Plugin registration order no longer affects cache stability.

Diagnostics.

PromptCacheObservability tracks per-session snapshots. If cacheRead tokens drop more than 1,000 from the prior turn and fall below a 95% stability ratio, it records a PromptCacheBreak with specific change codes. Setting OPENCLAW_CACHE_TRACE=1 enables full JSONL trace logging with SHA-256 digests at every stage.

Why this matters.

For long-running agent sessions on Anthropic models, this is a direct cost reduction. Every unnecessary cache break means reprocessing the entire system prompt. The boundary trick, the normalizations, and the deterministic ordering all work together to keep the cache hot.

5. Security Hardening

An unusually active security push across all five releases, targeting supply-chain attacks, SSRF, timing attacks, and execution environment isolation.

Execution environment blocklist.

Two tiers of blocked environment variables, each serving a different purpose.

blockedKeys (~52 variables) are always stripped from the execution environment. These include NODE_OPTIONS, PYTHONHOME, PYTHONPATH, BASH_ENV, all DYLD_* and LD_* dynamic linker variables, Java options (JAVA_OPTS, JAVA_TOOL_OPTIONS, JDK_JAVA_OPTIONS), build tool options (MAVEN_OPTS, GRADLE_OPTS), Rust compiler overrides, .NET startup hooks, and SSLKEYLOGFILE.

blockedOverrideKeys (~80+ variables) are blocked from config injection specifically. This is the supply-chain defense layer: PIP_INDEX_URL, PIP_EXTRA_INDEX_URL, PIP_TRUSTED_HOST (Python package index hijack), UV_INDEX, UV_INDEX_URL (uv package manager), GOPROXY, GONOSUMDB (Go modules), HTTP_PROXY, HTTPS_PROXY, ALL_PROXY (proxy redirection), NODE_EXTRA_CA_CERTS, SSL_CERT_FILE, CURL_CA_BUNDLE (TLS CA root hijack), DOCKER_HOST, DOCKER_TLS_VERIFY (Docker endpoint redirect), plus prefix blocks on GIT_CONFIG_*, NPM_CONFIG_*, and CARGO_REGISTRIES_*.

The separation is principled: blockedKeys prevent execution-time attacks regardless of source. blockedOverrideKeys prevent configuration-time attacks where a malicious config could redirect dependency resolution to attacker-controlled servers.

SSRF redirect guard — two-phase DNS pinning.

Every request and every redirect hop gets two validation phases.

Phase 1 (pre-DNS): Reject private hostnames (localhost, *.local, *.internal, metadata.google.internal), all private/loopback/link-local IPs, legacy octal/hex IPv4, and IPv6 special-use ranges.

Phase 2 (post-DNS): Re-validate all resolved addresses. This catches DNS rebinding attacks where a public hostname resolves via CNAME to a private IP like 192.168.x.x.

Redirect handling is manual (max 3 hops). Every hop gets fresh SSRF checks. Authorization and Cookie headers are stripped on cross-origin redirects. Body is dropped on 303/POST→GET rewrites.

Timing-safe secret comparison.

function safeEqualSecret(provided, expected) {
  if (typeof provided !== "string" || typeof expected !== "string")
    return false;
  const hash = (s) => createHash("sha256").update(s).digest();
  return timingSafeEqual(hash(provided), hash(expected));
}
function safeEqualSecret(provided, expected) {
  if (typeof provided !== "string" || typeof expected !== "string")
    return false;
  const hash = (s) => createHash("sha256").update(s).digest();
  return timingSafeEqual(hash(provided), hash(expected));
}
function safeEqualSecret(provided, expected) {
  if (typeof provided !== "string" || typeof expected !== "string")
    return false;
  const hash = (s) => createHash("sha256").update(s).digest();
  return timingSafeEqual(hash(provided), hash(expected));
}

The SHA-256 pre-hash normalizes inputs to the same fixed length before timingSafeEqual. This is critical because Node's timingSafeEqual throws on length mismatch — which would itself leak length information via the exception timing. Now standardized across all webhook handlers.

Device pairing and trust boundaries.

Pairing tokens use 256-bit CSPRNG. The v3 payload format adds platform and deviceFamily to the signed payload, so spoofed metadata invalidates the signature. Non-operator sessions can only manage their own tokens.

Mixed trust configurations — trusted-proxy and shared token simultaneously — now hard-fail at startup. Previously, a co-located process could bypass the proxy and fall through to token auth.

Plugin HTTP routes default to operator.write scope only, not admin-level. CLI operator scopes are restricted to shared-secret bearer auth on OpenAI-compatible routes.

6. ACPX Embedded Runtime + CLI MCP Bridge

Two changes that expand how Eragon agents interact with external tools and CLI environments.

reply_dispatch hook.

The ACPX plugin registers a reply_dispatch hook that fires after the LLM completes a turn, right before reply delivery to channels. If the session is ACP-managed, ACPX takes ownership of dispatch and routes through the ACP control plane. If not, it returns { handled: false } and normal delivery runs. This replaces a hardcoded ACP path check in core auto-reply routing — cleaner separation of concerns.

Loopback MCP bridge.

When Eragon runs a CLI backend, it spins up a local HTTP/MCP server on 127.0.0.1 with a random port and a 64-character hex bearer token. The CLI backend receives an MCP config pointing to this server, so Claude CLI, Codex, and Gemini CLI can all call Eragon's full tool set via JSON-RPC.

Config injection is per-CLI: Claude CLI gets --strict-mcp-config --mcp-config <path>, Codex gets -c mcp_servers=..., and Gemini CLI gets a settings.json write.

Per-session tool resolution via McpLoopbackToolCache ensures each CLI backend sees exactly the tools available for that session, agent, and workspace.

Security for Claude CLI specifically: CLAUDE_CONFIG_DIR, CLAUDE_CODE_PLUGIN_*, provider-routing overrides, and Bedrock/Vertex/Foundry env vars are all cleared before launch. The CLI is forced to --setting-sources user to prevent repo-local .claude settings from executing inside non-interactive sessions.

7. Everything Else

A rapid-fire list of the other changes that shipped across the five releases.

Control UI Multilingual. 12 languages added: Chinese (Simplified and Traditional), Portuguese (Brazil), German, Spanish, Japanese, Korean, French, Turkish, Indonesian, Polish, and Ukrainian.

QQ Bot. New bundled channel with multi-account support, SecretRef credentials, slash commands, and media handling.

Execution Defaults Changed. Gateway and node execution now defaults to security=full, ask=off — no approval prompts by default.

Matrix Improvements. Draft streaming (partial replies update the same message in place), exec approvals via native Matrix prompts, DM session isolation, and E2EE thumbnail encryption.

Amazon Bedrock. Guardrails support, inference-profile discovery, and automatic request-region injection.

iOS Exec Approvals via APNs. Native in-app approval modal triggered by push notification.

SearXNG. New bundled web search provider plugin.

Plugin Install Security. Dangerous-code critical findings now fail closed by default.

Config Schema. New openclaw config schema CLI command prints full JSON Schema for openclaw.json.

Cron Improvements. Interrupted recurring jobs replay on first gateway restart (not second). Failure notifications go through the job's primary delivery channel.

What's Next

The direction is clear: make agents more reliable, more autonomous, and more secure. Memory Dreaming gives agents the ability to learn what matters over time. Task Flows give them durable orchestration for complex workflows. The security hardening makes all of it safe to run in production.

We're building in the open. If you want to see the code behind any of this, the OpenClaw project is where it lives.

Book a demo to see Eragon in action.