Armature is a continuous testing platform for MCP (Model Context Protocol) servers. You connect your MCP server, define test workflows with natural-language criteria, and Armature dispatches an AI agent on a schedule to exercise your tools and evaluate the results. When something breaks, Armature alerts your team and gives you the tools to diagnose and repair the failure — including an agent-facing MCP endpoint you can point your own AI coding assistant at.Documentation Index
Fetch the complete documentation index at: https://docs.armature.tech/llms.txt
Use this file to discover all available pages before exploring further.
What problems Armature solves
MCP servers change. Tools are renamed, arguments shift, behavior regresses. Without automated testing, you find out about failures when users report them. Armature solves two specific problems: Regressions go undetected. A tool that worked last week may silently return errors today. Armature runs your test workflows continuously on a schedule and fails fast when behavior changes. Test coverage is invisible. It is hard to know which tools in your MCP server are actually exercised by any test. Armature’s coverage reports show exactly which tools are covered by workflows and which are untested, so you can close gaps before shipping.Key features
Workflows
Scheduled AI-agent tests. Each workflow has a natural-language prompt that the agent acts on, plus observable criteria that a separate evaluator model checks. Results are graded pass, partial, or fail per criterion.
Tool Monitors
Lightweight, recurring pings for individual MCP tools. Monitors run on a fixed interval — as often as every minute — and report pass, fail, or transport error without running a full agent workflow.
Coverage Reports
A live view of which tools in your MCP server’s catalog are exercised by workflows versus uncovered. Coverage updates after every run so you always know where your gaps are.
MCP Repair API
An agent-facing MCP endpoint at
/api/mcp. Point your AI coding assistant at it to search runs, inspect failures, propose workflow patches, and trigger reruns — all from inside your editor.Alerts
Slack and email notifications triggered by workflow failures or tool monitor errors. Alert rules are scoped per workflow or per monitor so you control the noise level.
API Keys
Bearer tokens that authenticate with the Armature MCP API. Keys are scoped to your organization and frozen to your current role at the time of creation.
Who uses Armature
Armature is built for teams that develop and maintain MCP servers — whether you are building an internal tool server, a public MCP integration, or an agent-facing API layer. You benefit most from Armature when:- Your MCP server is updated frequently and you need automated regression coverage.
- You want continuous uptime visibility for individual tools, not just the server as a whole.
- You want to use an AI coding assistant to help triage and repair failing workflows without leaving your editor.
- Multiple teammates need to see the same test results and receive the same failure alerts.
Armature tests MCP servers over HTTP. Your server must expose a
streamable_http or sse transport endpoint reachable from the internet. Stdio-only servers are not supported at this time.How a test run works
When a workflow fires, Armature dispatches an AI agent with your prompt and the credentials you configured for your MCP server. The agent calls tools, reads resources, and produces a result. A separate evaluator model then checks each of your criteria against the agent’s trace and assigns a pass, partial, or fail verdict per criterion. The overall run status rolls up from the individual criterion results. Tool monitors work differently: they call a single tool directly on the configured interval, without running an agent. A monitor passes if the tool returns without error. It fails if the tool returnsisError: true, and errors if the transport itself fails to connect.
Next steps
Get started
Connect your first MCP server and run your first workflow in minutes.
Key concepts
Learn the vocabulary: workflows, runs, criteria, monitors, coverage, and more.