Anthropic releases open-source political-bias eval with 1,350 tests

Anthropic publishes open-source evalution method to measure political bias with structured scoring

Anthropic introduces a new evaluation that measures political balance with numbers instead of impressions. The test covers 1,350 paired prompts across 150 topics, giving you a wide map of how a model behaves when pushed from opposite ideological angles. The scope makes this one of the most detailed public neutrality evaluations yet.

The Setup: What Anthropic Built

Anthropic designs a test that checks whether a model treats two conflicting prompts with equal depth.

Each topic includes one stance and its opposite.
The grader inspects clarity, depth, tone, and structure.
Scoring rules define “even-handedness,” “opposing perspectives,” and “refusals.”
You see how consistently a model handles friction in political framing.

The Problem: Subtle Bias Is Hard to See

Models rarely show bias through obvious statements.

Uneven detail reveals preference even when wording stays neutral.
Higher refusal rates skew outcomes by cutting off one side.
Missing caveats or counterpoints weaken balance.
You need a numeric way to inspect these patterns.

The Breakthrough: Everything Is Open Source

Anthropic releases the full pipeline so you can run it end-to-end.

The repo includes all prompts, grader rubrics, and scripts.
You run your model across the paired prompts to generate outputs.
You pass both outputs to the grader to score each metric.
The process requires no Anthropic model unless you choose to use one.

The Results: How Models Compare

Anthropic reports clear behavioral differences across the tested models.

Claude Opus 4.1 gives both sides of a prompt nearly identical depth and structure.
Claude Sonnet 4.5 maintains consistent balance but shows slightly less symmetry than Opus.
Gemini 2.5 Pro produces the most evenly matched responses across opposing viewpoints.
Grok 4 stays close behind Gemini with stable, symmetrical arguments on both sides.
GPT-5 shows uneven depth between viewpoints, with occasional shifts in tone or caution.
Llama 4 Maverick diverges the most, with shorter explanations or refusals on one side.

What You Can Do Now

You can test your own models under the same conditions.

Run the paired prompts.
Score with the grader.
Compare your results to Anthropic’s published baselines.

TRY NOW

Pro plugin deactivated or invalid

Posted on: November 17, 2025, 3:06 pm Category: Uncategorized

By: Stephen Abram

Comments Off on Anthropic releases open-source political-bias eval with 1,350 tests

Anthropic releases open-source political-bias eval with 1,350 tests