AutonomyOps — Customer Explainer

INTERNAL — SALES USE ONLY First-meeting leave-behind for qualified technical buyers. Suitable for engineers, platform leads, and CTOs at or approaching the fleet boundary. Do not publish publicly.


What AutonomyOps is

AutonomyOps is a runtime governance system for autonomous workloads — agents, robots, and any process that issues consequential actions on its own. It sits between an agent’s intent and the world the agent acts on, evaluates each tool call against an explicit policy, and produces a tamper-evident record of every decision.

It is built as three composable layers, each addressing a distinct failure domain:

                                              ┌──────────────────────┐
                                              │     Orchestrator     │
                                              │  (fleet semantics:   │
                                              │   rollout intent,    │
                                              │   policy distribution│
                                              │   aggregated audit)  │
                                              └──────────┬───────────┘
                                                         │ enrollment,
                                                         │ rollout intent,
                                                         │ audit aggregation
                                                         ▼
   ┌─────────┐   ┌────────────────────┐    ┌──────────────────────────────┐
   │  Agent  │──▶│  Autonomy Runtime  │    │      Edge Relay (mesh)       │
   └─────────┘   │  (per-call policy  │◀──▶│  edged ↔ edged ↔ edged ...   │
                 │   enforcement,     │    │  deterministic peer-to-peer  │
                 │   fail-closed,     │    │  segment relay among         │
                 │   local WAL)       │    │  configured peers, bounded   │
                 └────────────────────┘    │  retries, deadletter         │
                                           └──────────────────────────────┘

Layer

Component

Responsibility

1. Execution

autonomy runtime (ADK)

Intercept every tool call; evaluate policy; fail closed; write WAL. Single-node, fully offline. Owns local enforcement and the authoritative decision record.

2. Edge

edged daemon + edgectl CLI

Deterministic peer-to-peer content relay among configured peers (known_peers) with bounded local resources. Persists accepted segments, retries within a configured budget, and routes to deadletter on exhaustion. Not an orchestrator transport; does not implement convergence, leader election, or fleet semantics.

3. Control plane

AutonomyOps orchestrator

Owns fleet semantics: policy distribution, phased rollout, blast-radius control, rollback, and aggregated audit across enrolled nodes.

The runtime and orchestrator share the policy model, audit format, and runtime contract; the edge layer preserves verified byte movement among configured peers without interpreting policy or fleet intent — by invariant (INV-10), edge/ does not import adk/runtime, adk/policy, or adk/orchestrator. The core autonomy runtime workflow is identical across tiers; additional fleet and operator surfaces (edgectl, orchestrator CLI/UI) layer on top as deployments grow.


What the runtime does

When an agent decides to take an action — call an API, write to a file, issue a command to connected hardware — the runtime intercepts that call, evaluates it against the active policy bundle, and returns an allow or deny decision before the action executes. If the evaluator is unavailable, the system denies. There is no default-permit mode.

Every decision is written to a tamper-evident local WAL. The WAL survives process crashes, network partitions, and node restarts. It is the authoritative record of what was allowed, what was denied, and what policy was active at each decision point.

The mechanism in one sentence: AutonomyOps is the authority layer between an agent’s intent and its execution — not monitoring what happened, but governing whether it can happen.


What it runs on

The CE runtime installs as a single binary. No Docker, no Kubernetes, no control plane required. It starts in-process alongside the agent, injects AUTONOMY_RUNTIME_URL into the subprocess environment, and begins enforcing immediately.

curl -fsSL https://get.autonomyops.ai/install.sh | bash
autonomy run python3 my_agent.py

CE on a single node provides: fail-closed policy enforcement, per-call audit to the local WAL, policy versioning with last-known-good rollback, and offline operation with cryptographically signed bundles. There is no dependency on network connectivity after install.


The edge layer

Most production autonomy runs on networks that do not behave like data-center networks. Robots move in and out of coverage. Field deployments operate behind constrained uplinks. Sites lose connectivity for minutes, hours, or days, and must continue operating safely the entire time.

The edge layer (edged / edgectl) is deterministic peer-to-peer content relay among configured peers. It moves artifact and segment bytes between nodes; it is not an orchestrator transport, and it does not understand orchestrator intent. It is intentionally narrow in scope:

  • Bounded local persistence. Accepted relay segments are persisted to a local-root store within a configured disk ceiling. Retention and retry budgets are explicit configuration; data exceeding those bounds is rejected at write time or routed to operator-recoverable deadletter, not silently dropped.

  • Deterministic relay, not eventual consistency. The relay ledger drives a fixed state machine (Scheduled Inflight Acked | Failed | Deadletter) with a deterministic scan order and no PRNG in the relay path. There is no convergence protocol, vector clock, CRDT, or gossip — by invariant (INV-02).

  • Bounded retries with deadletter. Failed deliveries retry up to max_retries and then halt at deadletter until an operator retries or purges. Edge does not infinitely buffer.

  • Partition-safe local enforcement. A node that loses connectivity to its peers and to the orchestrator continues to enforce its last validated policy via the runtime. The runtime does not degrade to permit; the edge layer does not invent fleet state in anyone’s absence.

  • Static peer topology. Peer relationships are explicit configuration (known_peers), not dynamic discovery. Segments move between configured peers under a per-pair delivery state machine; there is no notion of routing “toward” the orchestrator at the edge layer — that is orchestrator concern, expressed separately as rollout intent and optional mesh hints.

What edge does not own: policy activation semantics, rollout sequencing, audit aggregation, fleet convergence, or any decision about which node should hold which state. Those belong to the runtime (locally) and the orchestrator (across the fleet). Edge moves artifact and segment bytes among configured peers, deterministically and within bounded local resources — nothing more.

The edge layer is not optional infrastructure for production fleets — it is what keeps the runtime/orchestrator contract honest when the network does not cooperate.


Why existing stacks do not solve this

The tools most teams reach for — Kubernetes, CI/CD, feature flags, internal scripts — solve adjacent problems. Each one fails at runtime governance for a specific structural reason.

Kubernetes knows whether the agent is running. It does not know whether the agent is behaving correctly. An agent executing an unsafe tool call inside a Kubernetes pod returns Running, Ready, and Healthy throughout. The control loop has no input from the governance layer. Service health is not mission health.

CI/CD governs the artifact. It verifies the right thing was shipped. It cannot verify that the thing shipped is behaving within its authorized parameters right now. “The build passed” does not mean “the policy bundle is active and enforcing.” Deployment governance and runtime governance are not the same problem.

Feature flags control which code path executes. They are configuration, not policy. They have no fail-closed semantics — when the flag service is unreachable, behavior falls back to a default, not to a deny. They produce no per-call audit trail. They cannot express “deny this action if current velocity exceeds 2.0 m/s” — that is a three-line Rego rule.

Internal scripts fail at the moments when governance matters most: under adversarial input, in edge cases the author did not anticipate, during incidents when correctness is required. They have no formal evaluation semantics, no fail-closed guarantee, no tamper-evident record, and no coordination across multiple nodes.

Question

Kubernetes

CI/CD

Feature flags

Scripts

AutonomyOps

Is the agent running?

Did the agent make tool call X?

Was tool call X permitted by policy?

Does deny-all fire when evaluator fails?

Tamper-evident per-call audit?

Survives network partition?

Coordinated policy state across nodes?


The fleet boundary

CE handles one node. The moment a deployment spans more than one node — and especially when consistent governance across them is required — the problem changes structurally. The runtime alone cannot answer fleet-level questions; the edge layer alone cannot make fleet-level decisions.

The fleet boundary is the point at which the following questions become unanswerable without a coordination layer:

  • Did the same policy activate correctly on all nodes?

  • What policy is currently active on each node — right now?

  • When a rollback is triggered, did it take effect everywhere, including nodes that were offline at the time?

  • Can a centralized audit trail be produced that covers all nodes for a specific time window?

  • How is rollout blast radius bounded when a bad bundle is detected mid-deployment?

A team that has crossed the fleet boundary is operating with governance risk that compounds with every node added. See the separate Fleet Boundary Diagnostic for the 8-signal self-assessment.


CE to commercial

CE is free and designed for single-node evaluation. The CE runtime is the foundation that the orchestrator builds on for the policy, audit, and runtime contract — same policy language, same audit format, same runtime semantics across tiers. The edge layer ships alongside as deterministic peer-to-peer byte relay; it does not interpret that contract (INV-10). Adopting more layers extends the system; it does not replace it.

When a deployment crosses the fleet boundary, the orchestrator owns the new semantics — policy distribution and activation state across nodes, phased rollout with blast-radius control, aggregated audit, and operator recovery workflows that span every enrolled node. The edge layer relays artifact and segment bytes among configured peers under those conditions, with bounded retries and operator-recoverable deadletter when a peer cannot be reached within budget. Edge moves bytes; the orchestrator decides what those bytes mean for the fleet.

The commercial path is a 90-day evaluation against up to 10 nodes. Evaluation requires a signed Enterprise Evaluation Agreement. There is no per-call or per-request fee. Pricing is per enrolled node, minimum annual commitment. Conversion trigger at Day 75.


Adoption ladder

The motion is not “free trial to commercial SKU”; it is a progression from local proof, to controlled access, to fleet management, to private infrastructure when security requirements demand it.

Stage

Customer moment

AutonomyOps tier

Why they move

1. Discover

An autonomy or agent engineer runs the CE demo and sees local runtime governance work on a single node.

Public / CE

They need to prove the mechanism quickly without infrastructure or a sales process.

2. Build seriously

They contact AutonomyOps and receive controlled access for real project work.

Controlled

They need the controlled runtime package, deeper docs, and a qualified evaluation path.

3. Cross the fleet boundary

Their deployment grows beyond a single node and they can no longer reason about policy state, rollout status, or audit posture node by node.

SaaS orchestrator

They need hosted fleet management: centralized audit, rollout visibility, policy coordination, and operator workflows. The runtime keeps enforcing locally during disconnects; the edge layer relays artifact and segment bytes among configured peers when the network returns.

4. Enterprise posture

Security requirements tighten, or a customer requires private deployment, data residency, or network isolation.

Self-hosted / on-prem

They need to operate the orchestrator inside their own infrastructure with their own security controls.

CE proves the tool. Controlled access enables serious deployment. SaaS orchestrator manages fleets. Self-hosted satisfies enterprise security posture.

The tier names preserve that distinction. Controlled describes access to the runtime and evaluation materials; it does not imply the customer’s final deployment model. SaaS describes the hosted fleet-management layer. Self-hosted describes customers who operate that layer themselves.


Next step

If two or more fleet boundary signals apply to a current deployment, the right next step is a 30-minute technical validation call to establish specifics before the pilot conversation.

If four or more signals apply, the conversation is already overdue.

Email info@autonomyops.ai with the signals that apply. We will respond within one business day.


AutonomyOps · autonomyops.ai · Technical Alpha See also: Fleet Boundary Diagnostic · fleet-boundary-handout.md