muster

An AI agent is defined by a stack of plain-text files, not a single persona file. muster checks each file against its spec, then checks whether your model behaves the way the files declare. It runs offline and deterministically, with no provider baked in.

Get started View on GitHub

Seven layers, one harness

A persona (Soul.md), Agent Skills (SKILL.md), an operating procedure (AGENTS.md), a tool manifest (TOOLS.md), a memory store, a heartbeat checklist, and an A2A agent card. muster validates each one, and checks how they compose.

Two kinds of check

Static checks parse a file and validate it against its spec, offline and byte-for-byte reproducible. Behavioral checks grade a live model against what the file declares, over real multi-turn conversations.

Bring your own model

Behavioral grading talks to any OpenAI-compatible endpoint: local Ollama, NVIDIA NIM, OpenAI, whatever you run. The API key is read from the environment at request time. It never goes in a flag, a file, or a manifest.

Deterministic and offline

Static checking needs no network and produces the same bytes every run (RFC 8785 canonical JSON). You can wire it into CI as a hard gate and trust the result.

What muster checks

Each layer below has a static mode that always runs, and most add a behavioral mode that runs only when you point it at an endpoint. The layers reference covers each one in detail.

Layer	File	Command
Persona	`Soul.md` (Soul.md RFC-1 / CTS-1)	`check`, `resolve`, `cts run`, `behave run`
Skills	`SKILL.md` (agentskills.io)	`skills run`
SOP	`AGENTS.md` (OpenClaw SOP)	`sop run`
Tools	`TOOLS.md`	`tools run`
Memory	`MEMORY.md` / `USER.md`	`memory run`
Heartbeat	`HEARTBEAT.md`	`heartbeat run`
A2A	Agent Card (JSON)	`a2a run`
Cross-layer	composition of the above	`crosslayer run`

What the agent-file stack is

The files an agent reads at startup have multiplied. A persona file sets its voice and safety posture. A skills directory tells it what it can do and when. An AGENTS.md encodes the standard operating procedure. A tools manifest declares the functions it may call. A memory file holds what it should remember about you. A heartbeat checklist drives its scheduled work. An agent card advertises it to other agents.

Each of these has a spec, and a file that parses is not the same as a file the model actually follows. muster treats both questions as testable. It validates the file against the spec, and it grades the model against the file.

Where it came from

muster began as the reference CTS-1 harness for Soul.md RFC-1, the persona format. The engine underneath turned out to be spec-agnostic, so the same core now drives seven layers and the cross-layer composition checks. The normative spec text is vendored in the repository and is the single source of truth for every check. Each test names the section it enforces.