Evaluate Axtary

A guided ~15-minute evaluation for someone trying Axtary for the first time. It explains what the product is for (with real situations), gives you a runbook to see it work, and tells you what to look for — so you're judging the behavior, not just copy-pasting commands.

If you're here because a friend asked you to try it: thank you. The most useful thing you can give back is where you got confused or stuck — see Tell us what tripped you at the end.

What Axtary is actually for

You're starting to let an AI agent (Claude Code, Cursor, a script, an MCP tool) do things, not just chat — open PRs, post to Slack, update tickets, touch a database. The moment an agent can act, "it has an API key" stops being good enough, because a token authorizes a channel ("may post to Slack"), not the content ("may post this message to this channel").

Axtary sits in front of those actions and decides — using deterministic policy — whether each one is allowed, needs a human, or is denied, before it executes. Concretely, the kinds of things it's meant to prevent:

  • "The agent opened a PR that edits infra/prod/ (or read a .env)." Policy denies the secret read outright and forces a human to approve any diff that touches protected paths — the agent can't quietly do it.
  • "The agent posted to the wrong Slack channel / DM'd a customer." A channel allowlist blocks the wrong destination; messages to external recipients require a human to approve the exact text first.
  • "A confused or prompt-injected agent did something with a valid token." Even holding a real credential, the agent can't swap an approved PR diff or Slack message after a human approved it — the approval is cryptographically bound to that exact payload's hash. (Axtary doesn't claim to stop an agent from being wrong or injected; it bounds what a wrong agent can do and makes every attempt attributable.)
  • "Prove to security what the agent did." Every decision and outcome is written to a hash-chained, offline-verifiable ledger — evidence for an audit, not just logs.

Keep those four situations in mind as you run the steps below — each command is showing you one of them.

Tier 1 — see the loop in ~10 minutes (no credentials)

Everything here runs in deterministic fake mode. Nothing to sign up for, no token to paste, nothing leaves your machine.

Install (see Quickstart for the from-source option):

npm install -g @axtary/cli
mkdir axtary-eval && cd axtary-eval
axtary init
axtary doctor connectors --config axtary.yml

init writes a starter policy: deny .env*/secrets/ reads, require step-up approval for auth/ · billing/ · infra/prod/ writes, allowlist #axtary-dev for Slack. doctor shows what each connector would need to go real — note it doesn't ask you for anything yet.

1. Watch a full run of safe + unsafe actions:

axtary demo --config axtary.yml

What to look for: each action prints allow / deny / step_up with a reason, and lands in the ledger at .axtary/actions.jsonl. This is the whole product in one screen — a policy decision on every action before it runs. (Maps to all four situations above.)

2. Make an agent get blocked reading a secret:

axtary proxy --config axtary.yml      # leave running
# in another terminal:
echo '{"cwd":"'$PWD'","tool_name":"Read","tool_input":{"file_path":"'$PWD'/.env.production"}}' \
  | axtary hook claude-code

What to look for: a deny naming the blocked path prefixes, written to the ledger. This is the "agent read a .env" situation, blocked. (deny is honored by Claude Code's CLI and VS Code extension — see the enforcement matrix for the one step-up caveat.)

3. See MCP tool-poisoning get caught:

axtary mcp drift-demo

What to look for: a poisoned tool definition (it tries to exfiltrate ~/.ssh / cloud creds) is denied mcp_definition_hash_not_allowed and the upstream tool is never called, while the reviewed version executes. This is provenance binding — the MCP sibling of content binding.

That's the core. If you only have ten minutes, stop here — you've seen deny, step-up, content namespace, and provenance.

4. Hand the audit trail to a third party (still no credentials):

axtary attest-ledger --config axtary.yml --out attestation.json
axtary verify-export attestation.json

What to look for: attest-ledger signs your ledger export into a self-contained attestation.json (export + signature + public key). verify-export prints VALID — and it's a standalone check: it needs only the bundle and a JOSE library, no trust in Axtary. Change a single byte of any record in the file and re-run: it flips to INVALID (ledger_record_hash_mismatch, exit 1). That's tamper-evidence an auditor can verify offline. (Maps to "prove to security what the agent did.")

There's more spine under the hood you don't drive from this runbook but can read about in Core concepts: proof-of-possession (a captured pass can't be replayed by another key), delegation (a sub-agent only ever gets a narrower pass), and budgets (a runaway agent is capped). You can also author your own rules and dry-run them with axtary policy simulate.

Tier 2 — the real loop + tamper demo (optional, your own sandbox)

This is the high-signal version, but it needs your own throwaway sandbox credentials (a private test repo + fine-grained token or GitHub App, a Slack app in a test channel, a Linear test issue). Budget ~30–60 minutes for setup. Do not point this at anything production. Setup details: Quickstart §5 and Credentials.

Once a connector is in real mode and doctor is green:

The real workflow — an agent reads a Linear issue, searches docs, opens a real draft PR, and posts an approved Slack update:

axtary run workflow github-pr-review --real --config axtary.real.yml \
  --repo <you>/<sandbox-repo> --linear-issue <KEY> --project <TEAM> \
  --slack-channel '#your-test-channel' \
  --head "axtary/eval-$(date +%s)" --approve-step-up

What to look for: a real draft PR appears in your sandbox repo, a real Slack message posts, and the step-up action records step_up_approved. The run ends with ledger valid: true. (Tamper-evidence is enforceable: axtary export-ledger refuses to export — ledger_export_invalid:…hash_mismatch — if any record was altered after the fact.)

The tamper demo — this is the differentiator. It approves the exact Slack payload, then mutates the message after approval and tries to send it:

axtary run workflow github-pr-review --real --config axtary.real.yml \
  --repo <you>/<sandbox-repo> --linear-issue <KEY> --project <TEAM> \
  --slack-channel '#your-test-channel' \
  --head "axtary/eval-tamper-$(date +%s)" --tamper

What to look for: the post is blocked before the Slack API is called (approval_action_hash_mismatch) — no tampered message is sent. The approval was bound to the original payload's hash, so the swap is rejected. This is the "confused/injected agent with a valid token" situation, stopped. Use a fresh --head each run so the GitHub step doesn't collide on an existing branch.

Tell us what tripped you

This is the part we actually need. The person who shared this gave you a one-time feedback link — open it and a short form asks exactly these:

  • Where did you get stuck, confused, or have to guess? (install, init, doctor, a flag, the output)
  • Did any output read as alarming or unclear when it shouldn't have?
  • Did the why land — could you tell which real situation each step was demonstrating?
  • Would you actually route a non-prod agent action through this? If not, what's missing?

The link works once, so jot notes as you go and submit at the end. Rough and blunt is more useful than polished. No link? Ask whoever shared this, or email support@axtary.com.

Where to go next

  • Core concepts — ActionPass, deterministic policy, the ledger, content-vs-channel authorization.
  • Integrate your agent — wire Claude Code or any MCP client through Axtary for real.