Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.armature.tech/llms.txt

Use this file to discover all available pages before exploring further.

Armature has a small set of core concepts that appear throughout the dashboard, API, and documentation. Understanding how they relate to each other will help you set up tests confidently and interpret results correctly.
An MCP server is any HTTP endpoint that speaks the Model Context Protocol. In Armature, an MCP server is the target you are testing. You register it by providing its base URL, transport type (streamable_http or sse), and authentication credentials.When you first connect a server, Armature probes it live — calling initialize and listTools — to discover the tools it exposes. The discovered tools are stored in Armature’s tool catalog for that server and updated whenever you run a live probe or sync the catalog.Each MCP server can have one or more auth profiles: named credential sets (bearer token or API key header) that workflows and monitors can reference. Credentials are stored encrypted and are never written to the dashboard database.See Connecting an MCP server for setup instructions.
A workflow is a scheduled test that dispatches an AI agent against one of your MCP servers. Every workflow has two required parts:
  • Prompt — The natural-language instruction the agent receives. Write it the way you would write a task for a human: be explicit about what the agent should do and which tools it should use.
  • Criteria — One or more plain-English statements about observable outcomes the agent must produce. Each criterion is evaluated independently by a separate evaluator model after the agent finishes.
Workflows also have a schedule (a cron expression, or manual-only), a model selection (one or more tester models), and an optional tool policy (see below).When a workflow fires, Armature creates a run — a single execution of the workflow. The workflow itself is a container for runs; the run is where the actual evidence lives.See Workflows overview for authoring guidance.
A run is a single execution of a workflow. Runs are created automatically on the workflow’s schedule, or manually when you click Run now. Each run records:
  • The tester model used
  • The full prompt sent to the agent
  • Every tool call the agent made, including inputs and outputs
  • The evaluator’s verdict on each criterion
  • The overall run status
Run statuses:
StatusMeaning
passAll required criteria evaluated to pass.
partialSome criteria passed, but at least one required criterion was partial or not fully met.
failOne or more required criteria failed.
errorThe agent itself encountered an unrecoverable error before the evaluator could run.
Runs are immutable after they complete. You can inspect a run’s tool call trace and criterion results at any time from the Run history page or from the workflow’s History tab.See Runs for filtering and navigation.
Criteria are the pass/fail assertions that define what a successful run looks like. Each criterion is a single plain-English statement written in the workflow editor’s Success criteria tab. The evaluator model reads the agent’s trace and grades each criterion independently.Criterion verdicts:
VerdictMeaning
passThe criterion is clearly satisfied by the evidence in the run.
partialThe criterion is partially satisfied — some evidence supports it, but not all conditions are met.
failThe criterion is not satisfied.
Write criteria as externally checkable outcomes. Include the specific evidence the evaluator should look for: which tool was called, what field appeared in the response, what was absent. Avoid vague criteria like “works correctly” — prefer “The agent calls create_order exactly once and reports the returned order ID.”One criterion per line. Each line is evaluated separately, so a single failing criterion does not prevent the others from passing.See Workflow authoring guidelines for examples and best practices.
A tool monitor is a lightweight, recurring health check for a single MCP tool. Unlike a workflow, a tool monitor does not run an AI agent — it calls the tool directly with a fixed set of arguments and records whether it succeeded.Monitor intervals: 1 minute, 5 minutes, 15 minutes, or 1 hour.Monitor statuses:
StatusMeaning
passThe tool responded without throwing and without returning isError: true.
failThe tool returned isError: true in its response.
errorThe transport itself failed — the server was unreachable or returned an unexpected error before MCP-level parsing.
Monitors are configured from the MCP Servers page. After connecting a server, the monitor wizard discovers available tools and lets you select which ones to monitor and at what interval. Tools that require arguments ask for a JSON argument blob before saving.You can have multiple monitors for the same server — one per tool you want to track. Monitors run independently and report individually.See Tool monitors for setup and management.
A coverage report shows which tools in your MCP server’s catalog are exercised by your workflows and which are not. Coverage is calculated from the most recent 50 runs for the server, across up to 100 active tools in the catalog.A tool counts as covered if:
  • It was successfully called in at least one recent run, or
  • It is listed in a workflow’s allowed_tools policy (even if not called in recent runs).
A tool is uncovered if it appears in the catalog but has not been called by any recent workflow run and is not in any allowed_tools policy.Coverage reports help you identify gaps in your test suite before they become production regressions. Use the report to decide which tools need new workflows.Coverage is also accessible through the MCP repair API as a resource: armature://mcp-servers/{id}/coverage-report.See Coverage reports for details.
Workflows in Armature are immutable by version. When you edit a workflow’s prompt, criteria, or tool policy and save, Armature does not overwrite the existing definition — it creates a new workflow version and makes it the active version going forward.Past runs are always associated with the version that was active when they ran, so you can compare results before and after a change without losing historical data.The MCP repair API’s propose_workflow_patch and apply_workflow_change tools interact with workflow versions explicitly: a proposal targets a specific baseVersionId, and applying the proposal creates a new version. If the workflow has moved to a newer version before you apply, Armature returns base_version_stale with the current version ID so you can rebase your proposal.
A tool policy constrains which tools a workflow’s agent is permitted to use. You can set it in the workflow editor’s advanced settings or through the MCP repair API.
  • allowed_tools — An explicit list of tools the agent is expected to use. If you set allowed_tools, only those tools count toward coverage for this workflow. An empty allowed_tools means “open policy” — no restriction, and no coverage credit from the policy alone.
  • blocked_tools — Tools the agent must never call during this workflow. If the agent calls a blocked tool, the run is automatically flagged.
Tool policies are validated against the server’s tool catalog at save time. If a policy entry references a tool that no longer exists in the catalog, Armature warns you.See Workflow authoring guidelines for examples.
An alert rule triggers a notification when a workflow or tool monitor reaches a failing state. Armature supports two delivery channels:
  • Slack — Sends a message to a channel or user in your connected Slack workspace. Public channels can be entered as #channel-name. Private channels require you to invite the Armature bot first, then use the channel ID.
  • Email — Sends an email to a specified address when a rule fires.
Alert rules are scoped: you can create one rule that alerts on all workflow failures, or narrow rules that alert only on specific workflows or specific monitors. This lets you route critical workflow failures to a dedicated alert channel while keeping lower-priority monitors quieter.See Slack alerts and Email alerts for setup instructions.
Every member of an Armature workspace has one of four roles. Roles control what actions that member can take in the dashboard and through the MCP repair API.
RoleWhat they can do
viewerRead workflows, runs, monitors, coverage reports, and MCP API resources. Propose workflow patches. Compare runs.
editorEverything a viewer can do, plus: apply workflow change proposals, trigger manual runs, create and edit monitors, and create alert rules.
adminEverything an editor can do, plus: connect and delete MCP servers, sync tool catalogs, manage team invitations, and configure alert channels.
ownerEverything an admin can do, plus: manage billing, change workspace settings, and transfer ownership.
The workspace owner is the account that created the workspace. Owners and admins can invite teammates from Settings → Team.
API keys are frozen to the issuing user’s role at the time of creation. If your role is later upgraded, existing keys do not automatically gain new permissions — create a new key to pick up the expanded role.
An API key is a bearer token that authenticates with the Armature MCP API (/api/mcp). Keys have the format amt_<key-id>_<secret> and are passed in the Authorization header:
Authorization: Bearer amt_<key-id>_<secret>
Keys are:
  • Shown once — The full token is displayed only when it is created. Armature stores only a scrypt hash server-side. Copy the token before closing the creation dialog.
  • Scoped to your organization — A key only grants access to the workspace it was created in.
  • Frozen to your role — The key inherits the permissions of your role at creation time.
  • Revocable — Deleting a key from the dashboard immediately invalidates it. Subsequent MCP requests with that key return 401 unauthenticated.
Create and manage API keys from Settings → API keys.See MCP API authentication for how to use a key with an MCP client.

How the concepts connect

A typical Armature setup looks like this: you connect one or more MCP servers, then create workflows (scheduled agent tests) and tool monitors (direct tool pings) against those servers. Each time a workflow fires, Armature produces a run with a status and individual criteria verdicts. Coverage reports roll up across all runs to show which tools are tested. When something fails, alert rules notify your team on Slack or email. If you use the MCP repair API, you authenticate with an API key and use a client with your org role to triage and propose fixes.

Get started

Walk through connecting a server, setting up monitors, and creating your first workflow.

Workflows in depth

Learn how to author effective prompts and criteria.

MCP Servers

Connect servers, manage auth profiles, and run live probes.

MCP Repair API

Use the agent-facing API to investigate and fix failing workflows.