verification sweep in progress.

The automated sandbox is currently working through the catalog. Verified entries will appear here as they complete auditing.

How verification works

01

Sandbox provisioning

We spin up an isolated Docker container with a controlled environment: no production credentials, a confined filesystem, and egress monitoring. The container is specific to the tool being tested.

02

Installation check

We run the exact install command shown on the entry page. If it fails, errors, or requires interactive prompts it doesn't document, verification fails here. No manual intervention.

03

Behavioral audit

A test agent (powered by Kimi k2) runs the tool against a suite of prompts. We check that outputs match documented claims, that error handling is graceful, and that the tool doesn't perform undocumented actions.

04

Side-effect inspection

We inspect network egress (unexpected outbound calls), filesystem writes outside declared boundaries, and environment variable access. Any unexplained behavior flags the entry for human review.

05

Transcript generation

The full session transcript is saved — every tool call, every response, every side effect observed. This becomes the comparison page linked from the entry's "View Behavior Analysis" button.

06

Verdict

Entries receive one of three verdicts: verified (passed all checks), caveat (passed with documented limitations), or flagged (unexpected behavior — listed with explanation, not removed).

Verdict states

verified

Passed all sandbox checks. Installation, behavior, and side effects match documented claims.

caveat

Passed with documented limitations — e.g., requires network egress for a specific feature, or handles one edge case unexpectedly.

flagged

Unexpected behavior detected. Listed with a full explanation of what was found. Not removed — you deserve to know.

untested

Not yet through the queue. Most entries start here. The badge is absent — no false assurance.