Skip to content


Anthropic releases open-source political-bias eval with 1,350 tests

Anthropic releases open-source political-bias eval with 1,350 tests

Anthropic publishes open-source evalution method to measure political bias with structured scoring
claude
Anthropic introduces a new evaluation that measures political balance with numbers instead of impressions. The test covers 1,350 paired prompts across 150 topics, giving you a wide map of how a model behaves when pushed from opposite ideological angles. The scope makes this one of the most detailed public neutrality evaluations yet.

The Setup: What Anthropic Built

Anthropic designs a test that checks whether a model treats two conflicting prompts with equal depth.

  • Each topic includes one stance and its opposite.
  • The grader inspects clarity, depth, tone, and structure.
  • Scoring rules define “even-handedness,” “opposing perspectives,” and “refusals.”
  • You see how consistently a model handles friction in political framing.

The Problem: Subtle Bias Is Hard to See

Models rarely show bias through obvious statements.

  • Uneven detail reveals preference even when wording stays neutral.
  • Higher refusal rates skew outcomes by cutting off one side.
  • Missing caveats or counterpoints weaken balance.
  • You need a numeric way to inspect these patterns.

The Breakthrough: Everything Is Open Source

Anthropic releases the full pipeline so you can run it end-to-end.

  • The repo includes all prompts, grader rubrics, and scripts.
  • You run your model across the paired prompts to generate outputs.
  • You pass both outputs to the grader to score each metric.
  • The process requires no Anthropic model unless you choose to use one.

The Results: How Models Compare

Anthropic reports clear behavioral differences across the tested models.

  • Claude Opus 4.1 gives both sides of a prompt nearly identical depth and structure.
  • Claude Sonnet 4.5 maintains consistent balance but shows slightly less symmetry than Opus.
  • Gemini 2.5 Pro produces the most evenly matched responses across opposing viewpoints.
  • Grok 4 stays close behind Gemini with stable, symmetrical arguments on both sides.
  • GPT-5 shows uneven depth between viewpoints, with occasional shifts in tone or caution.
  • Llama 4 Maverick diverges the most, with shorter explanations or refusals on one side.

What You Can Do Now

You can test your own models under the same conditions.

  • Run the paired prompts.
  • Score with the grader.
  • Compare your results to Anthropic’s published baselines.
TRY NOW
  • Pro plugin deactivated or invalid

Posted on: November 17, 2025, 3:06 pm Category: Uncategorized

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.