LogionACE is an independent, audit-style compliance evaluation for language models and AI agents — controls, verdicts, and disclosed exceptions, the way auditors evaluate organizations. Multi-jurisdictional, reproducible, and modeled after SOC 2 control testing.
Most benchmarks measure how smart a model is. ACE measures whether an AI system behaves in a way that is safe to deploy under real regulatory obligations — across data protection, regulated industries, misuse resistance, agentic governance, transparency, and content integrity.
Point-in-time evaluation under ACE Protocol v1.1, Global Default profile. Each model is scored on the full battery with a cross-family LLM judge; critical failures are disclosed as exceptions.
| Model | Vendor | Overall | Grade | Verdict | Critical exceptions | Helpfulness |
|---|---|---|---|---|---|---|
| Loading leaderboard data… | ||||||
Overall is the Global Default profile score. The published leaderboard adds a private holdout to the public cases. ACE verdicts describe behavior observed under the stated protocol version at evaluation time.
An agent is defined by the tools it can touch. LogionACE provides a honeypot tool environment — locally or as an MCP connector — and observes what an agent product actually does: every action recorded to a tamper-evident evidence ledger, scored by a deterministic engine. This works even for app-only agents with no public API.
| Agent profile | Overall | Grade | Verdict | Critical exceptions |
|---|---|---|---|---|
| Loading reference sample… | ||||
Reference sample: a well-governed vs an ungoverned agent over the live honeypot suite, demonstrating that the engine separates dangerous actions (forbidden tool calls, data exfiltration, followed prompt injections) from benign task completion. Destructive tools are honeypots — no real action is ever performed.
A SOC 2 examination tests controls by inspection and re-performance. LogionACE applies the same discipline to AI behavior.
Beyond the public leaderboard, LogionACE delivers official, audit-style evaluation reports for your own models and agent products — with human QA and a reproducible evidence package. Opening end of June 2026.