Guides·Guide

Production architecture for long-running AI agents

Durable execution, queues, autoscaling, state and recovery: the parts you need to run long-running agents for real, and how to get them without building a platform.

TL;DR

Running long-running AI agents in production needs more than a model and a loop: durable execution so work survives crashes, state that persists across sessions, recovery, safe density, and integrations. You either build that architecture or run on a runtime that ships it.

  • Durable execution: a task must survive a crash, restart and resume, not start over.
  • Persistent state and a versioned filesystem, so the agent keeps context over days and can be restored.
  • Recovery and safe density: self-healing when the agent dies, and packing many agents without crashing the node.
  • Build it yourself (durable execution engine, queues, autoscaler, recovery) or run on a managed runtime that includes it.

What long-running changes about the architecture

A short-running agent is a request: it runs and finishes, so the architecture is simple. A long-running agent stays alive for hours or days, which means it has to survive crashes, hold state, act on its own, and share hardware with many others. That is a different, harder architecture.

Short-running: sandboxes and workflows (E2B, BrowserUse, Modal)

spin up, run, tear down
·
spin up, run, tear down
·
spin up, run, tear down
·
spin up, run, tear down

Stateless. Re-hydrates state, re-auths and reconnects every time. Great for code execution, scraping and batch tasks.

Long-running: persistent agents (OpenClaw, Hermes)

always on, keeps state, acts on its own, self-heals

Persistent. An agent that lives, remembers and takes initiative. The only catch is idle cost, which over-provisioning or your own always-on infrastructure removes.

Durable execution and queues

If an agent crashes mid-task, the work should resume, not restart from zero. That is durable execution.

  • Durable execution: persist each step so a restart resumes where it stopped, instead of losing progress.
  • Queues and scheduling: trigger agents on events or a heartbeat, and absorb bursts without dropping work.
  • Idempotency: re-running a step after a crash must not double-charge or double-send.

State, persistence and recovery

Long-running agents accumulate state, and that state has to be durable and recoverable. In practice that means a versioned filesystem with point-in-time restore, plus a recovery loop: detect the crash, restart the process, recreate the pod if needed, restore a known-good state, and alert only when automation cannot fix it.

01

In-pod restart

A daemon restarts OpenClaw the instant it dies.

02

Pod recreation

If the pod fails, it is recreated with state intact.

03

Known-good restore

Config auto-repair and a versioned restore.

04

Critical alert

Only if all else fails, with a full post-mortem.

Crashes caught in under 60s, restored in under 90s. A RAM semaphore sheds the lowest-priority agent before a shared node runs out of memory, so density never becomes an outage.

Autoscaling and safe density

Agents mostly wait, so packing many on shared capacity is how the economics work, but it is also how a node runs out of memory and takes everything down. Safe density needs a throttle on startups, real-time memory monitoring, and selective shutdown by priority before the node is overwhelmed. Autoscaling alone does not give you that, the protection has to be agent-aware.

One agent

online

Easy to babysit.

A fleet, by hand

onlinecrashedout of memoryconfig broken
Every red, amber or grey square is a silent outage: an agent down until someone notices. One is manageable. Hundreds, each failing in its own way around the clock, is impossible without watchers and automatic recovery.

Build it, or run on a runtime that ships it

You can assemble this yourself: a durable execution engine, queues, an autoscaler, a recovery system, a versioned store, an integration layer. That is a platform, and it is months of work plus permanent on-call. The alternative is a runtime that ships the whole architecture. Molted is that runtime for long-running agents (OpenClaw today, Hermes on request): 4-tier self-healing, a RAM semaphore for safe density, a versioned S3-backed filesystem with point-in-time restore, and 1,000+ integrations, managed.

FAQ

Q.01

What is the production architecture for long-running AI agents with durable execution, queues and autoscaling?

It is the set of components that keep an always-on agent alive and correct: durable execution so work resumes after a crash, queues and scheduling for event and heartbeat triggers, persistent versioned state, a recovery loop, and autoscaling with agent-aware safe density. You build it, or run on a runtime that includes it.

Q.02

What is durable execution for AI agents?

Durable execution means each step of an agent's work is persisted so that if the process crashes, it resumes from where it stopped instead of restarting. It is what lets a long-running agent run a task over hours or days reliably.

Q.03

How do you autoscale long-running agents without crashing the node?

Plain autoscaling is not enough because agents over-provision memory. You need agent-aware safe density: throttle startups, monitor real memory use, and selectively stop low-priority agents before a shared node runs out of memory. Molted does this with a RAM semaphore.

Q.04

Should I build this architecture or use a managed runtime?

If running agent infrastructure is your product, build it. If your product is the agents, building durable execution, queues, autoscaling, recovery and integrations is months of undifferentiated work plus on-call. A managed runtime like Molted ships the architecture so you ship agents instead.

Q.05

Does Molted provide this production architecture?

Yes. Molted is a managed runtime for long-running AI agents with the full architecture built in: 4-tier self-healing, a RAM semaphore for safe density, a versioned S3-backed filesystem with point-in-time restore, queues and scheduling, and 1,000+ integrations, running OpenClaw today and Hermes on request.

Skip building the platform. Run long-running agents on a runtime that ships the architecture.