Verified Directory
Every entry here has passed our automated sandbox audit — installation verified, behavior documented, no unexpected side effects.
verification sweep in progress.
The automated sandbox is currently working through the catalog. Verified entries will appear here as they complete auditing.
How verification works
Sandbox provisioning
We spin up an isolated Docker container with a controlled environment: no production credentials, a confined filesystem, and egress monitoring. The container is specific to the tool being tested.
Installation check
We run the exact install command shown on the entry page. If it fails, errors, or requires interactive prompts it doesn't document, verification fails here. No manual intervention.
Behavioral audit
A test agent (powered by Kimi k2) runs the tool against a suite of prompts. We check that outputs match documented claims, that error handling is graceful, and that the tool doesn't perform undocumented actions.
Side-effect inspection
We inspect network egress (unexpected outbound calls), filesystem writes outside declared boundaries, and environment variable access. Any unexplained behavior flags the entry for human review.
Transcript generation
The full session transcript is saved — every tool call, every response, every side effect observed. This becomes the comparison page linked from the entry's "View Behavior Analysis" button.
Verdict
Entries receive one of three verdicts: verified (passed all checks), caveat (passed with documented limitations), or flagged (unexpected behavior — listed with explanation, not removed).
Verdict states
Passed all sandbox checks. Installation, behavior, and side effects match documented claims.
Passed with documented limitations — e.g., requires network egress for a specific feature, or handles one edge case unexpectedly.
Unexpected behavior detected. Listed with a full explanation of what was found. Not removed — you deserve to know.
Not yet through the queue. Most entries start here. The badge is absent — no false assurance.