Armature Workflows: Automated MCP Server Testing

Workflows are the core building block of Armature. Each workflow defines a tester prompt — the task an AI agent should accomplish — and a set of evaluation criteria that determine whether the run passed. Armature executes workflows on a schedule, records every tool call and trace event, and surfaces the results in the dashboard so you can catch regressions the moment they happen.

What a workflow contains

A workflow packages three things together: the prompt given to the tester agent, the evaluation criteria the judge model uses to score the run, and a schedule that controls when runs fire automatically. When a run completes, the judge evaluates each criterion independently and produces a per-criterion verdict, a roll-up status, and a full tool-call trace you can inspect. Every edit you make to a workflow creates a new immutable version. Armature keeps the full version history so you can trace exactly which prompt or criteria change coincided with a regression.

Workflow states

A workflow is either Active or Paused.

Active — the workflow runs automatically on its configured schedule and appears in coverage and health reports.
Paused — scheduled runs are suspended, but the workflow and all its past run data are preserved. You can resume it at any time by toggling it back to Active in the editor.

Pausing is useful during planned downtime or when you are iterating on a new version of the workflow and want to avoid noisy failures.

The workflow list

The Workflows page shows a table with one row per workflow. Each row displays:

Column	What it shows
Workflow	Name and description
Schedule	Human-readable schedule label (e.g. “Hourly”, “Manual”)
Pass rate (7d)	Percentage of finalized runs that passed in the last seven days, color-coded green / yellow / red
Last run	Relative timestamp of the most recent execution
Status	Active or Paused

Use the segmented control at the top of the page to switch between Active and Archived workflows. The search box filters the current tab by workflow name or description — useful when you have many workflows across multiple MCP servers.

Archived workflows

Pausing or deleting a workflow moves it to the Archived tab instead of removing it permanently. Archived workflows do not run on their schedule, do not count against coverage and health reports, and are hidden from the default Active list — but the workflow definition, version history, and every past run are preserved. To bring an archived workflow back, switch to the Archived tab and click Restore on the row. The workflow returns to the Active list with its previous schedule and configuration intact, and resumes running on its next scheduled tick.

If you need to experiment with a major prompt change without affecting production runs, pause the workflow first, make your edits, and click Restore from the Archived tab once you are satisfied.

Finding and navigating workflows

From the workflow list you can:

Click Run history on any row to jump to that workflow’s runs filtered in the Runs page.
Click Edit to open the workflow editor, where you can update the prompt, criteria, schedule, and models.
Click New workflow to create a workflow from scratch.

The 7-day pass rate turns yellow when it falls below 95% and red when it falls below 80%. A healthy workflow stays green. If you see yellow or red, open the run history to identify which criteria are failing.

How the pass rate is calculated

The Pass rate (7d) column counts only finalized runs from the last seven days — runs whose status is completed, tester_failed, evaluation_failed, timed_out, or canceled. In-flight runs (queued, running, or evaluating) and archived runs are excluded from both the numerator and the denominator, so a workflow with two passing runs and one still-running run reports 100%, not 67%. The pass rate updates as soon as each in-flight run reaches a terminal status. In-flight runs are still visible in the workflow detail page under Active runs and on the Monitoring dashboard’s activity timeline — they just do not move the pass rate until they complete.

Workflow versions

Every time you save changes to a workflow, Armature creates a new immutable workflow version. The current version is what the tester agent runs. The version history is visible in the History tab of the workflow editor and is accessible via the MCP repair API for diffing and regression analysis.

Learn more

Authoring workflows

How to write effective prompts and evaluation criteria that produce reliable, deterministic results.

Scheduling

Choose the right schedule for your workflow — from manual-only to fully automated cron runs.

Run results

Understand pass, partial, fail, and error statuses. Inspect traces, compare runs, and diagnose failures.

MCP API overview

Use the Armature MCP repair API to propose patches, trigger runs, and compare results from your own agents.

​What a workflow contains

​Workflow states

​The workflow list

​Archived workflows

​Finding and navigating workflows

​How the pass rate is calculated

​Workflow versions

​Learn more

Authoring workflows

Scheduling

Run results

MCP API overview

What a workflow contains

Workflow states

The workflow list

Archived workflows

Finding and navigating workflows

How the pass rate is calculated

Workflow versions

Learn more