Armature has a small set of core concepts that appear throughout the dashboard, API, and documentation. Understanding how they relate to each other will help you set up tests confidently and interpret results correctly.Documentation Index
Fetch the complete documentation index at: https://docs.armature.tech/llms.txt
Use this file to discover all available pages before exploring further.
MCP Server
MCP Server
An MCP server is any HTTP endpoint that speaks the Model Context Protocol. In Armature, an MCP server is the target you are testing. You register it by providing its base URL, transport type (
streamable_http or sse), and authentication credentials.When you first connect a server, Armature probes it live — calling initialize and listTools — to discover the tools it exposes. The discovered tools are stored in Armature’s tool catalog for that server and updated whenever you run a live probe or sync the catalog.Each MCP server can have one or more auth profiles: named credential sets (bearer token or API key header) that workflows and monitors can reference. Credentials are stored encrypted and are never written to the dashboard database.See Connecting an MCP server for setup instructions.Workflow
Workflow
A workflow is a scheduled test that dispatches an AI agent against one of your MCP servers. Every workflow has two required parts:
- Prompt — The natural-language instruction the agent receives. Write it the way you would write a task for a human: be explicit about what the agent should do and which tools it should use.
- Criteria — One or more plain-English statements about observable outcomes the agent must produce. Each criterion is evaluated independently by a separate evaluator model after the agent finishes.
Run
Run
A run is a single execution of a workflow. Runs are created automatically on the workflow’s schedule, or manually when you click Run now. Each run records:
Runs are immutable after they complete. You can inspect a run’s tool call trace and criterion results at any time from the Run history page or from the workflow’s History tab.See Runs for filtering and navigation.
- The tester model used
- The full prompt sent to the agent
- Every tool call the agent made, including inputs and outputs
- The evaluator’s verdict on each criterion
- The overall run status
| Status | Meaning |
|---|---|
pass | All required criteria evaluated to pass. |
partial | Some criteria passed, but at least one required criterion was partial or not fully met. |
fail | One or more required criteria failed. |
error | The agent itself encountered an unrecoverable error before the evaluator could run. |
Criteria
Criteria
Criteria are the pass/fail assertions that define what a successful run looks like. Each criterion is a single plain-English statement written in the workflow editor’s Success criteria tab. The evaluator model reads the agent’s trace and grades each criterion independently.Criterion verdicts:
Write criteria as externally checkable outcomes. Include the specific evidence the evaluator should look for: which tool was called, what field appeared in the response, what was absent. Avoid vague criteria like “works correctly” — prefer “The agent calls
| Verdict | Meaning |
|---|---|
pass | The criterion is clearly satisfied by the evidence in the run. |
partial | The criterion is partially satisfied — some evidence supports it, but not all conditions are met. |
fail | The criterion is not satisfied. |
create_order exactly once and reports the returned order ID.”One criterion per line. Each line is evaluated separately, so a single failing criterion does not prevent the others from passing.See Workflow authoring guidelines for examples and best practices.Tool Monitor
Tool Monitor
A tool monitor is a lightweight, recurring health check for a single MCP tool. Unlike a workflow, a tool monitor does not run an AI agent — it calls the tool directly with a fixed set of arguments and records whether it succeeded.Monitor intervals: 1 minute, 5 minutes, 15 minutes, or 1 hour.Monitor statuses:
Monitors are configured from the MCP Servers page. After connecting a server, the monitor wizard discovers available tools and lets you select which ones to monitor and at what interval. Tools that require arguments ask for a JSON argument blob before saving.You can have multiple monitors for the same server — one per tool you want to track. Monitors run independently and report individually.See Tool monitors for setup and management.
| Status | Meaning |
|---|---|
pass | The tool responded without throwing and without returning isError: true. |
fail | The tool returned isError: true in its response. |
error | The transport itself failed — the server was unreachable or returned an unexpected error before MCP-level parsing. |
Coverage Report
Coverage Report
A coverage report shows which tools in your MCP server’s catalog are exercised by your workflows and which are not. Coverage is calculated from the most recent 50 runs for the server, across up to 100 active tools in the catalog.A tool counts as covered if:
- It was successfully called in at least one recent run, or
- It is listed in a workflow’s
allowed_toolspolicy (even if not called in recent runs).
allowed_tools policy.Coverage reports help you identify gaps in your test suite before they become production regressions. Use the report to decide which tools need new workflows.Coverage is also accessible through the MCP repair API as a resource: armature://mcp-servers/{id}/coverage-report.See Coverage reports for details.Workflow Version
Workflow Version
Workflows in Armature are immutable by version. When you edit a workflow’s prompt, criteria, or tool policy and save, Armature does not overwrite the existing definition — it creates a new workflow version and makes it the active version going forward.Past runs are always associated with the version that was active when they ran, so you can compare results before and after a change without losing historical data.The MCP repair API’s
propose_workflow_patch and apply_workflow_change tools interact with workflow versions explicitly: a proposal targets a specific baseVersionId, and applying the proposal creates a new version. If the workflow has moved to a newer version before you apply, Armature returns base_version_stale with the current version ID so you can rebase your proposal.Tool Policy
Tool Policy
A tool policy constrains which tools a workflow’s agent is permitted to use. You can set it in the workflow editor’s advanced settings or through the MCP repair API.
allowed_tools— An explicit list of tools the agent is expected to use. If you setallowed_tools, only those tools count toward coverage for this workflow. An emptyallowed_toolsmeans “open policy” — no restriction, and no coverage credit from the policy alone.blocked_tools— Tools the agent must never call during this workflow. If the agent calls a blocked tool, the run is automatically flagged.
Alert Rule
Alert Rule
An alert rule triggers a notification when a workflow or tool monitor reaches a failing state. Armature supports two delivery channels:
- Slack — Sends a message to a channel or user in your connected Slack workspace. Public channels can be entered as
#channel-name. Private channels require you to invite the Armature bot first, then use the channel ID. - Email — Sends an email to a specified address when a rule fires.
Org roles
Org roles
Every member of an Armature workspace has one of four roles. Roles control what actions that member can take in the dashboard and through the MCP repair API.
The workspace owner is the account that created the workspace. Owners and admins can invite teammates from Settings → Team.
| Role | What they can do |
|---|---|
| viewer | Read workflows, runs, monitors, coverage reports, and MCP API resources. Propose workflow patches. Compare runs. |
| editor | Everything a viewer can do, plus: apply workflow change proposals, trigger manual runs, create and edit monitors, and create alert rules. |
| admin | Everything an editor can do, plus: connect and delete MCP servers, sync tool catalogs, manage team invitations, and configure alert channels. |
| owner | Everything an admin can do, plus: manage billing, change workspace settings, and transfer ownership. |
API keys are frozen to the issuing user’s role at the time of creation. If your role is later upgraded, existing keys do not automatically gain new permissions — create a new key to pick up the expanded role.
API Key
API Key
An API key is a bearer token that authenticates with the Armature MCP API (Keys are:
/api/mcp). Keys have the format amt_<key-id>_<secret> and are passed in the Authorization header:- Shown once — The full token is displayed only when it is created. Armature stores only a scrypt hash server-side. Copy the token before closing the creation dialog.
- Scoped to your organization — A key only grants access to the workspace it was created in.
- Frozen to your role — The key inherits the permissions of your role at creation time.
- Revocable — Deleting a key from the dashboard immediately invalidates it. Subsequent MCP requests with that key return
401 unauthenticated.
How the concepts connect
A typical Armature setup looks like this: you connect one or more MCP servers, then create workflows (scheduled agent tests) and tool monitors (direct tool pings) against those servers. Each time a workflow fires, Armature produces a run with a status and individual criteria verdicts. Coverage reports roll up across all runs to show which tools are tested. When something fails, alert rules notify your team on Slack or email. If you use the MCP repair API, you authenticate with an API key and use a client with your org role to triage and propose fixes.Get started
Walk through connecting a server, setting up monitors, and creating your first workflow.
Workflows in depth
Learn how to author effective prompts and criteria.
MCP Servers
Connect servers, manage auth profiles, and run live probes.
MCP Repair API
Use the agent-facing API to investigate and fix failing workflows.