Sandbox Architecture

Sibyl's sandbox system provides isolated, ephemeral execution environments for AI agents. The architecture splits into two planes: a control plane (the API server) that manages sandbox lifecycle and task routing, and an execution plane (runner daemons) that actually runs tasks inside sandboxed environments. Runtime lifecycle is delegated to the kubernetes-sigs/agent-sandbox Sandbox CRD.

Architecture Overview

┌─────────────────────────────────────────┐
│           Sibyl API Server              │
│  ┌──────────────┐ ┌──────────────────┐  │
│  │ Controller   │ │   Dispatcher      │  │
│  │ (CRD client) │ │  (task queue)     │  │
│  └──────┬───────┘ └────────┬─────────┘  │
│         │                  │            │
│  ┌──────┴──────────────────┴─────────┐  │
│  │      WebSocket Protocol           │  │
│  └──────────────┬────────────────────┘  │
└─────────────────┼───────────────────────┘
                  │
     ┌────────────┼──────────────┐
     │            │              │
 ┌───┴───┐  ┌───┴───┐     ┌────────────────┐
 │Runner │  │Runner │ ... │ agent-sandbox  │
 │(pod)  │  │(pod)  │     │ controller/CRD │
 └───────┘  └───────┘     └────────────────┘

Controller manages sandbox lifecycle by creating/updating/deleting agents.x-k8s.io/v1alpha1Sandbox resources.
Dispatcher routes tasks to runners based on availability, capabilities, and warm worktree proximity.
WebSocket Protocol provides bidirectional communication between server and runners.
Runners are stateless daemons that register with the server and execute assigned tasks.

Compute Tiers

Sibyl supports multiple isolation levels, chosen per deployment:

Tier	Isolation	Use Case
Local	Process-level	Development, testing
Docker	Container-level	CI/CD, staging
Kubernetes	Pod-level	Production
vCluster	Cluster-level	Multi-tenant production

Higher tiers provide stronger isolation at the cost of provisioning latency.

BYOD Model

Sibyl uses a Bring Your Own Device model for runner infrastructure. Runners self-register with the API server, declaring:

Capabilities — What the runner can do (e.g., docker, gpu, high-memory)
Project affinity — Which projects have warm worktrees on this runner

The task router scores candidate runners on three axes:

Availability — Is the runner idle or near capacity?
Capability match — Does the runner have the required capabilities?
Warm worktree proximity — Does the runner already have the project cloned and ready?

This scoring model minimizes cold-start time by preferring runners that already have the right environment cached.

Runner Daemon Protocol

Communication between the API server and runners uses a WebSocket-based bidirectional protocol.

Server-to-Runner Messages

Message	Description
`heartbeat`	Periodic liveness check
`task_assign`	Assign a queued task to this runner
`task_cancel`	Cancel a running task

Runner-to-Server Messages

Message	Description
`heartbeat_ack`	Acknowledge liveness check
`status`	Report runner load, capabilities, health
`task_ack`	Confirm task assignment accepted
`task_complete`	Report task finished (with result/artifacts)
`task_reject`	Decline task assignment (capacity, mismatch)
`agent_update`	Stream agent progress/logs during execution
`project_register`	Register or update project affinity

The protocol is designed to be resilient to transient disconnects. Runners automatically reconnect and re-register on connection loss. Tasks that were in-flight during a disconnect enter a grace period before being reassigned.

Sandbox Lifecycle

pending → starting → running → suspending → suspended → deleted
                        │                       │
                        └── failed ◄────────────┘

State	Description
`pending`	Sandbox requested, waiting for resources
`starting`	Provisioning environment (pulling images, cloning repos)
`running`	Active and accepting tasks
`suspending`	Saving state before suspension
`suspended`	Idle, state preserved, resources released
`deleted`	Cleaned up, resources freed
`failed`	Error state, reachable from any other state

Sandboxes auto-suspend after SIBYL_SANDBOX_IDLE_TTL_SECONDS of inactivity and are hard-deleted after SIBYL_SANDBOX_MAX_LIFETIME_SECONDS. Suspend/resume maps to Sandbox.spec.replicas = 0|1 in the agent-sandbox CRD.

Task Lifecycle

queued → dispatched → acked → running → completed
                                  │
                                  ├── failed → retry → queued
                                  └── canceled

State	Description
`queued`	Task submitted, waiting for runner assignment
`dispatched`	Assigned to a runner, awaiting acknowledgment
`acked`	Runner confirmed receipt
`running`	Actively executing
`completed`	Finished successfully
`failed`	Execution error (may retry)
`retry`	Scheduled for re-queue after failure
`canceled`	Explicitly canceled by user or system

Failed tasks are retried up to a configurable limit before being marked as permanently failed.

Auth Model

Sandbox runners authenticate via JWT tokens with specific claims:

Claim	Description
`org`	Organization ID (tenant isolation)
`sub`	Subject (user or service account)
`rid`	Runner ID
`sid`	Sandbox ID (if bound to a sandbox)
`scp`	Scope — must include `sandbox:runner`

Strict binding is enforced for sandbox-bound runners: a runner token with a sid claim can only execute tasks within that specific sandbox. This prevents a compromised runner from accessing other sandboxes in the same organization.

Configuration Reference

All sandbox configuration uses the SIBYL_SANDBOX_ prefix:

Variable	Default	Description
`SIBYL_SANDBOX_MODE`	`off`	Policy: `off`, `shadow`, `enforced`
`SIBYL_SANDBOX_DEFAULT_IMAGE`	`ghcr.io/hyperb1iss/sibyl-sandbox:latest`	Default container image for sandboxes
`SIBYL_SANDBOX_WORKTREE_BASE`	`/tmp/sibyl/sandboxes`	Base path mounted for sandbox worktrees
`SIBYL_SANDBOX_IDLE_TTL_SECONDS`	`1800`	Auto-suspend after idle (seconds)
`SIBYL_SANDBOX_MAX_LIFETIME_SECONDS`	`14400`	Maximum sandbox lifetime (seconds)
`SIBYL_SANDBOX_K8S_NAMESPACE`	`default`	Kubernetes namespace for sandbox pods
`SIBYL_SANDBOX_RECONCILE_ENABLED`	`true`	Enable background reconciliation loop

Deployment Modes

The SIBYL_SANDBOX_MODE variable controls how sandboxes are enforced:

`off` (Default)

Sandbox system is completely disabled. Tasks execute directly without isolation. Suitable for single-user development or when external orchestration handles isolation.

`shadow`

Sandbox operations are observed and logged but not enforced. Tasks can execute without a sandbox, but the system tracks what would have been sandboxed. Useful for:

Validating sandbox configuration before enforcement
Monitoring task patterns to tune runner capacity
Gradual rollout of sandbox requirements

`enforced`

All task execution requires a sandbox. Tasks submitted without a valid sandbox assignment are rejected. This is the recommended mode for production deployments where isolation guarantees matter.

Orchestrator Architecture — Higher-level agent orchestration
Agent Harness Vision — Autonomous agent execution model
Installation Guide — Sandbox setup instructions

Sandbox Architecture ​

Architecture Overview ​

Compute Tiers ​

BYOD Model ​

Runner Daemon Protocol ​

Server-to-Runner Messages ​

Runner-to-Server Messages ​

Sandbox Lifecycle ​

Task Lifecycle ​

Auth Model ​

Configuration Reference ​

Deployment Modes ​

off (Default) ​

shadow ​

enforced ​

Related Documentation ​