Anthropic releases open-source political-bias eval with 1,350 tests
| Anthropic publishes open-source evalution method to measure political bias with structured scoring |
|
 |
| Anthropic introduces a new evaluation that measures political balance with numbers instead of impressions. The test covers 1,350 paired prompts across 150 topics, giving you a wide map of how a model behaves when pushed from opposite ideological angles. The scope makes this one of the most detailed public neutrality evaluations yet.
The Setup: What Anthropic Built
Anthropic designs a test that checks whether a model treats two conflicting prompts with equal depth.
- Each topic includes one stance and its opposite.
- The grader inspects clarity, depth, tone, and structure.
- Scoring rules define “even-handedness,” “opposing perspectives,” and “refusals.”
- You see how consistently a model handles friction in political framing.
The Problem: Subtle Bias Is Hard to See
Models rarely show bias through obvious statements.
- Uneven detail reveals preference even when wording stays neutral.
- Higher refusal rates skew outcomes by cutting off one side.
- Missing caveats or counterpoints weaken balance.
- You need a numeric way to inspect these patterns.
The Breakthrough: Everything Is Open Source
Anthropic releases the full pipeline so you can run it end-to-end.
- The repo includes all prompts, grader rubrics, and scripts.
- You run your model across the paired prompts to generate outputs.
- You pass both outputs to the grader to score each metric.
- The process requires no Anthropic model unless you choose to use one.
The Results: How Models Compare
Anthropic reports clear behavioral differences across the tested models.
- Claude Opus 4.1 gives both sides of a prompt nearly identical depth and structure.
- Claude Sonnet 4.5 maintains consistent balance but shows slightly less symmetry than Opus.
- Gemini 2.5 Pro produces the most evenly matched responses across opposing viewpoints.
- Grok 4 stays close behind Gemini with stable, symmetrical arguments on both sides.
- GPT-5 shows uneven depth between viewpoints, with occasional shifts in tone or caution.
- Llama 4 Maverick diverges the most, with shorter explanations or refusals on one side.
What You Can Do Now
You can test your own models under the same conditions.
- Run the paired prompts.
- Score with the grader.
- Compare your results to Anthropic’s published baselines.
|
|
|
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.