GPT-5.5 set a new bar

OpenAI drops its new frontier model GPT 5.5

“OpenAI shipped GPT-5.5 today (codename “Spud”), six weeks after 5.4 and one week after Opus 4.7. Interesting to see the release cadence now measured in “days since last frontier model.” It takes the top spot on Terminal-Bench 2.0 (82.7%), FrontierMath Tier 4 (35.4%), and long-context retrieval at 1M tokens, while Opus 4.7 hangs onto SWE-Bench Pro and MCP Atlas. API access is “coming very soon” at $5/$30 per million tokens, with the Pro variant at a spicy $30/$180.”

GPT 5.5 is here

“What’s new? OpenAI has introduced GPT-5.5, a new AI model designed to handle more complicated tasks with less hand-holding from the user.

Want more details? The main improvement is that GPT-5.5 can do more of the job on its own. OpenAI says it is better at writing and fixing code, searching the web, analyzing data, making documents and spreadsheets, using software tools, and sticking with longer tasks until they are finished. It is also supposed to understand unclear instructions more easily, make fewer mistakes, and use fewer tokens on many tasks, which makes it more efficient.

Why should you care? For everyday users, this could mean faster help with writing, planning, research, and computer tasks. For businesses and researchers, it could mean saving time on harder projects that used to require much more human effort.”

Via The AI Report

“OpenAI drops GPT-5.5

Our Report

OpenAI has released GPT-5.5, a model built for agentic work that can plan multi-step tasks, use tools, and check its own output with less supervision than previous versions. It handles coding, research, and document creation, and can navigate ambiguity without constant hand-holding.

Key Points

On Terminal-Bench 2.0, which tests complex command-line workflows, GPT-5.5 achieved 82.7% accuracy, while on SWE-Bench Pro for real-world GitHub issue resolution, it scored 58.6%, outperforming its predecessor.
While more capable, GPT-5.5 matches GPT-5.4’s per-token latency and uses fewer tokens to complete tasks, making it both smarter and more cost-efficient for enterprise deployments.
GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, with API pricing set at $5 per 1M input tokens and $30 per 1M output tokens.

Relevance

For enterprise teams already using AI for software development and knowledge work, GPT-5.5 offers a notable upgrade in autonomy and reasoning. The model’s ability to handle multi-step tasks with less hand-holding could reduce the need for constant prompt engineering. API pricing, while higher than GPT-5.4, is offset by the model’s token efficiency, which OpenAI says delivers better results with less compute.”

FULL STORY

Via The Neuron

“OpenAI shipped GPT-5.5 to put Anthropic on notice.

OpenAI dropped GPT-5.5 yesterday, exactly seven days after Anthropic shipped Opus 4.7. We’re officially in a one-lab-one-week launch cadence, and nobody is blinking.

Here’s what happened:

GPT-5.5 is “worker-class,” meaning built to finish tasks instead of just answering questions. It’s live in ChatGPT and Codex for Plus, Pro, Business, and Enterprise; API access comes “very soon” at $5/$30 per million input/output tokens.
It wins Terminal-Bench 2.0 (82.7% vs Opus 4.7’s 69.4%) and ties or beats industry pros on 84.9% of GDPval tasks across 44 jobs. It loses SWE-Bench Pro (58.6% vs 64.3%) with an asterisk citing Anthropic’s own flagged “signs of memorization” on that eval.
It jumps from 27.1% to 35.4% on FrontierMath Tier 4 and helped discover a new proof about off-diagonal Ramsey numbers, later verified in Lean.
OpenAI rated it “High” on both bio/chem and cyber capability. Partner XBOW called it “Mythos-like hacking, open to all,” prompting Trusted Access for Cyber for vetted defenders.

Why this matters: This is the first clean “GPT beats Claude” moment in over a year, and it landed exactly seven days after Anthropic’s best. If you run agents in production, you probably want to re-test them on both models this weekend.

Our take: Full disclosure: Corey’s been a longtime GPT fan and stayed with them; Grant has not. Testing 5.5 was the first time in forever (since o3, maybe) Grant actually liked using a GPT model. He never got on board with the 5 series. He’d begrudgingly use 5.1 and 5.2 only when Claude rate-limited him. For the last 18 months he’s been a power Claude user.

5.5 hasn’t pushed him all the way over, but there’s something much nicer about using this one. It doesn’t write too much (unlike most “smart” GPTs before it). It doesn’t sound as dumb when thinking fast. And honestly, it feels a little bit like Claude.

Meanwhile Opus 4.7 feels like the opposite: it feels like a GPT. More tokens, clunkier to talk to, Claude’s signature vibe harder to find. Are we in uno reverse land? Is it opposite day? Did we enter a parallel universe?

There’s actually a real explanation. Every’s Dan Shipper noticed the same thing: Opus 4.7 “feels slow” next to 5.5 because OpenAI has a hardware advantage you can actually feel. And SemiAnalysis’ Dylan Patel pointed out that Anthropic silently went from an L4 engineer (Opus 4.6) to an L6 (Mythos) in two months. The Opus 4.7 the rest of us get is compute-starved and deliberately restrained. Anthropic is a Ferrari on fuel rationing; OpenAI just bought the gas station. (No offense meant.)”

Via AI Secret

“GPT-5.5 Raises the Agent War

What’s happening: OpenAI just released GPT-5.5, and the story is not just a smarter chat model. It is a more expensive, more capable execution engine for agent systems like OpenClaw. The uploaded report says GPT-5.5 beats Claude Opus 4.7 across many core benchmarks, while using roughly half the tokens at comparable intelligence.

How this hits reality: Pricing is the tell. Standard GPT-5.5 is listed at $5 input and $30 output per million tokens, while GPT-5.5 Pro jumps to $30 and $180. That only works if the model actually finishes more work per run. For OpenClaw, this stresses the Claude-first assumption around agents, context, and coding reliability.

Key takeaway: The agent era is becoming a routing war. Claude Opus 4.7 still has brand trust, but GPT-5.5 forces platforms like OpenClaw to optimize for cost, capability, and task completion instead of model loyalty”

Via The Rundown AI

OpenAI retakes the frontier with GPT 5.5

“The Rundown: OpenAI just launched GPT-5.5 (codenamed ‘Spud’), the company’s long-awaited upgrade, pitched as a ‘new class of intelligence’ — topping benchmark scores across the industry and overtaking Anthropic on the AI model frontier.

The details:

5.5 sets highs across a series of reasoning, agentic, computer use, and coding tests for public models, with several scores comparable to Claude Mythos.
The model keeps the same speed as 5.4 with added efficiency, with OAI saying it used Codex and 5.5 to rewrite its own GPU code to improve infrastructure.
GPT-5.5 lands at $5/$30 per million input/output tokens for API pricing, with OAI pitching it as ‘half the cost of competitive frontier coding models.”
5.5 is rolling out across ChatGPT plans and in Codex with Thinking and Pro variants, with OAI continuing to highlight ‘generous usage’ for its new releases.

Why it matters: After months of Anthropic dominance, the vibe is shifting once again — with OpenAI rapidly shipping powerful new upgrades and rekindling the magic that felt a bit lost on previous releases. With Anthropic now wading through rate limit and quality degradation complaints, it’s a big week for Sama and co. on the sentiment front.”

Via Superhuman

“OpenAI unveils GPT-5.5: The new model leads across many common benchmarks, showing strength in knowledge work, computer use, and coding. OpenAI says it’s especially good at figuring out what you want and getting the task done — even when prompts are vague. GPT-5.5 can also take action across your connected workplace tools. Rolling out now to most paid plans. Watch GPT-5.5 in action.”

Via AI Ready

The week AI stopped being a product

“GPT-5.5 set a new bar, the coding wars got ugly, and Sullivan & Cromwell showed operators exactly how AI velocity breaks.

THE WEEK IN ONE SENTENCE

GPT-5.5 reset the model leaderboard on Thursday, but the more important story is that every layer underneath it, the cloud contracts, the coding tools, the legal filings, started behaving like critical infrastructure, with all the acquisition drama and liability exposure that entails.

THREE SIGNALS

01 • Models

GPT-5.5 landed with numbers that force another roadmap review

OpenAI shipped GPT-5.5 on Thursday, along with a GPT-5.5 Pro tier for Business and Enterprise plans. The headline benchmarks: 82.7% on Terminal-Bench 2.0 (Claude Opus 4.7: 69.4%, Gemini 3.1 Pro: 68.5%), 73.1% on OpenAI’s internal Expert-SWE, and 51.7% on FrontierMath Tiers 1-3. OpenAI also reports that the model uses significantly fewer tokens to complete the same Codex tasks as GPT-5.4. API availability is pending (“very soon,” per the launch post); ChatGPT and Codex rollouts started Thursday.

The “pick a model and standardize” plan most companies put in place in the first half of this year was built on GPT-5.4 / Opus 4.7 / Gemini 3 Pro pricing and performance. Those assumptions are now stale by a full leaderboard position. If you run a production workflow pinned to a specific model, the question worth asking your vendor this week is whether your contract lets you opt in to the new tier and on what timeline. Most enterprise contracts I’ve seen lock you to a specific model generation for twelve months, which means a better model on Thursday turns into a Q3 conversation, not a this-week one.

02 • Distribution

The coding wars got ugly, and acquisitions are how

Three separate stories landed this week that, taken together, describe a land grab rather than a product category:

SpaceX signed a deal giving it the right to acquire Cursor for $60 billion by year-end, or pay $10 billion if it walks. (CNBC)

Anthropic pulled Claude from Windsurf after Windsurf’s acquisition talks with OpenAI, a move widely read as signaling Anthropic will do the same to Cursor if SpaceX follows through.

OpenAI launched Workspace Agents in ChatGPT, bringing Codex-powered agents to Business, Enterprise, and Edu plans in production, not waitlist.

This is the pattern: every major lab wants to own the interface where developers write code, and they’re willing to cut off competitors’ models from competitors’ tools to get there. If your engineering org standardized on Cursor six months ago because it was the best tool, your standardization now has an ownership question attached.

The move this week: ask your engineering lead which models your coding tool actually has access to today, and whether any of those access relationships are one-pulled-plug away from breaking. If the answer is “I’ll check,” that’s the whole point.

03 • Velocity & Security

The velocity dividend showed up, so did the interest payments

Sundar Pichai confirmed at Cloud Next ’26 that 75% of new code at Google is AI-generated, up from 50% last fall. Intercom, a mid-market SaaS company, told Lenny Rachitsky’s podcast that it doubled engineering throughput in nine months using Claude Code. That’s the dividend.

The interest payments arrived in the same week. Security researcher @polsia disclosed CVE-2025-48757, a vulnerability in Lovable’s AI code generator that had propagated across 170+ apps on the platform. Anthropic’s Mythos was accessed through a breached third-party vendor. Vercel confirmed a breach via an OAuth token from an AI tool that one employee connected to. Lovable had a broken-authorization flaw exposing user credentials.

The companies shipping 2x faster aren’t automatically shipping 2x better. The teams that win this cycle treat AI-generated code with the same security gates as human-written code, and audit every AI integration with OAuth scope into their stack. If your security team can’t produce that inventory inside a week, you already have the answer.”

Pro plugin deactivated or invalid

Posted on: April 24, 2026, 9:07 am Category: Uncategorized

By: Stephen Abram

Comments Off on GPT-5.5 set a new bar

GPT-5.5 set a new bar

GPT 5.5 is here