An AI agent is defined by a stack of plain-text files, not a single persona file. muster checks each file against its spec, then checks whether your model behaves the way the files declare. It runs offline and deterministically, with no provider baked in.
A persona (Soul.md), Agent Skills (SKILL.md), an operating procedure
(AGENTS.md), a tool manifest (TOOLS.md), a memory store, a heartbeat
checklist, and an A2A agent card. muster validates each one, and checks how
they compose.
Two kinds of check
Static checks parse a file and validate it against its spec, offline and
byte-for-byte reproducible. Behavioral checks grade a live model against
what the file declares, over real multi-turn conversations.
Bring your own model
Behavioral grading talks to any OpenAI-compatible endpoint: local Ollama,
NVIDIA NIM, OpenAI, whatever you run. The API key is read from the
environment at request time. It never goes in a flag, a file, or a manifest.
Deterministic and offline
Static checking needs no network and produces the same bytes every run
(RFC 8785 canonical JSON). You can wire it into CI as a hard gate and trust
the result.
Each layer below has a static mode that always runs, and most add a behavioral
mode that runs only when you point it at an endpoint. The
layers reference covers each one in detail.
The files an agent reads at startup have multiplied. A persona file sets its
voice and safety posture. A skills directory tells it what it can do and when.
An AGENTS.md encodes the standard operating procedure. A tools manifest
declares the functions it may call. A memory file holds what it should remember
about you. A heartbeat checklist drives its scheduled work. An agent card
advertises it to other agents.
Each of these has a spec, and a file that parses is not the same as a file the
model actually follows. muster treats both questions as testable. It validates
the file against the spec, and it grades the model against the file.
muster began as the reference CTS-1 harness for
Soul.md RFC-1, the persona format. The
engine underneath turned out to be spec-agnostic, so the same core now drives
seven layers and the cross-layer composition checks. The normative spec text is
vendored in the repository and is the single source of truth for every check.
Each test names the section it enforces.
We use Google Analytics to see which pages people read. It only sets
cookies if you accept.