Is your AI fit to operate under real regulation?

LogionACE is an independent, audit-style compliance evaluation for language models and AI agents � controls, verdicts, and disclosed exceptions, the way auditors evaluate organizations. Multi-jurisdictional, reproducible, and modeled after SOC 2 control testing.

View the leaderboard Evaluate your agent →

6 / 21Domains / Controls

272Model test cases

12Jurisdictions

9Models evaluated

What ACE measures

Capability benchmarks miss the question enterprises actually ask

Most benchmarks measure how smart a model is. ACE measures whether an AI system behaves in a way that is safe to deploy under real regulatory obligations � across data protection, regulated industries, misuse resistance, agentic governance, transparency, and content integrity.

Audit-style

Controls, verdicts, and disclosed exceptions

6 trust domains, 21 controls
ACE Ready / Conditional / Not Ready
Every critical failure disclosed as an exception

Multi-jurisdictional

One global posture, 12 jurisdictions

Mapped to a structured regulatory corpus
English, Japanese, and Chinese cases
Deployment-scope aware scoring

Reproducible

Re-performable, not asserted

Deterministic checks plus a pinned cross-family judge
Tamper-evident evidence for agent runs
Anyone can re-derive the verdict

Leaderboard

Frontier models under the ACE battery

Point-in-time evaluation under ACE Protocol v1.1, Global Default profile. Each model is scored on the full battery with a cross-family LLM judge; critical failures are disclosed as exceptions.

Model	Vendor	Overall	Grade	Verdict	Critical exceptions	Helpfulness
Loading leaderboard data…

Overall is the Global Default profile score. The published leaderboard adds a private holdout to the public cases. ACE verdicts describe behavior observed under the stated protocol version at evaluation time.

Agent evaluation

We don't just test models. We test agents.

An agent is defined by the tools it can touch. LogionACE provides a honeypot tool environment � locally or as an MCP connector � and observes what an agent product actually does: every action recorded to a tamper-evident evidence ledger, scored by a deterministic engine. This works even for app-only agents with no public API.

Agent profile	Overall	Grade	Verdict	Critical exceptions
Loading reference sample…

Reference sample: a well-governed vs an ungoverned agent over the live honeypot suite, demonstrating that the engine separates dangerous actions (forbidden tool calls, data exfiltration, followed prompt injections) from benign task completion. Destructive tools are honeypots � no real action is ever performed.

Methodology

Built the way SOC 2 is tested

A SOC 2 examination tests controls by inspection and re-performance. LogionACE applies the same discipline to AI behavior.

Inspection

Hash-chained evidence

Every tool call and result recorded
Tamper-evident: any edit breaks the chain
Continuous, programmatic evidence � not screenshots

Re-performance

Deterministic scoring

Verdict is a pure function of the evidence
The customer can re-run and get the identical result
No grade inflation, no hidden assertions

Independence

Cross-family judging

Subjects are never judged by their own family
Authoritative regulatory text grounds obligation cases
Over-refusal penalized via Helpfulness Retention

ACE is an independent engineering evaluation modeled after the structure of established audit frameworks. It is not affiliated with, and an ACE verdict is not an attestation under, SOC 2, HIPAA, PCI DSS, Common Criteria, or Euro NCAP.

Evaluation service

Get your model or agent evaluated

Beyond the public leaderboard, LogionACE delivers official, audit-style evaluation reports for your own models and agent products � with human QA and a reproducible evidence package. Opening end of June 2026.

Request an evaluation See a sample report →