bleepo
AI

Claude Sonnet 5 called a useless flop by fans despite near-Opus benchmarks

6 min read

Anthropic released Claude Sonnet 5 on June 30, 2026, and the backlash started within minutes. The company pitched it as its most agentic Sonnet yet, a mid-tier model that gets close to the flagship Opus 4.8 at a fraction of the price. On paper the upgrade looks solid. In practice, a wave of developers, power users, and longtime fans called it a disappointment, a regression, or worse.

One post captured the mood and went viral fast. Sonnet 5 is a useless release, absolute flop of a model, it is not even that fast or cheap, the user wrote. Replies were split, but the critical voices were loud, and the phrase useless flop stuck to the launch almost immediately.

Anthropic framed Sonnet 5 as a real step forward for everyday and agentic work. The company promised substantial gains on agentic coding, tool use, reasoning, and knowledge work over the previous Sonnet 4.6. It said performance gets close to Opus 4.8 on several key benchmarks, especially when users turn up a new effort setting, and that the model is far more cost-efficient on medium to high effort tasks.

The headline numbers back part of that story. On SWE-bench Pro, a test of multi-file coding pulled from real code repositories, Sonnet 5 scored 63.2 percent, up from Sonnet 4.6's 58.1 percent. On GDPval-AA v2, which measures real-world professional tasks, it scored 1,618, a statistical tie with Opus 4.8's 1,616. On Humanity's Last Exam it hit 57.4 percent, nearly matching Opus 4.8's 57.9 percent. Anthropic also reported strong gains on Terminal-bench and on agentic search and computer-use tests when effort is raised.

The big new feature is an effort dial. Users can set it to low, medium, high, xhigh, or max, trading more tokens and cost for better results on hard problems. Anthropic paired the launch with introductory pricing of 2 dollars per million input tokens and 10 dollars per million output tokens, running through August 31, 2026, before it reverts to the standard 3 dollars and 15 dollars.

Sonnet 5 also ships with heavier safety guardrails by design. On one cybersecurity test, building a working Firefox exploit, the model scored 0 percent. Anthropic highlights that result as intentional, a sign of the model refusing dangerous work rather than a gap in capability.

The official numbers tell one story. Early hands-on testing and social media told another. The gap between the benchmark wins and how the model felt in daily use is the heart of the backlash.

A big complaint was cost in practice. Many users reported higher token consumption than expected, because the new tokenizer can map the same input to between 1.0 and 1.35 times more tokens. That eats into the promised savings, and for some it made the cheaper per-token pricing feel like no discount at all.

Longtime Sonnet users piled on too. Some said older versions, especially Sonnet 4.6, felt faster, smarter, or less censored on everyday prompts. Others complained about higher refusal rates, slower perceived speed, and what they called AI shrinkflation, the sense that each new release delivers less real improvement than the one before.

The heavy safety layer drew particular anger. It landed just weeks after stronger models like Fable 5 and Mythos 5 were suspended on June 12 under U.S. export controls. To some users, that timing made Sonnet 5 feel deliberately held back, a capable model wrapped in extra restrictions rather than a true step up.

The reaction spread beyond X. YouTube reviewers ran full tests with blunt titles like Claude Sonnet 5 is out and it is horrible, worst model by Anthropic ever, reflecting genuine frustration from power users who expected more. Even defenders tended to call it decent for the price or better than 4.6 in agentic workflows, but few called it exciting or revolutionary.

The release did not happen in a vacuum. Anthropic and other Western labs are working under growing U.S. export control pressure on advanced AI. Stronger models have faced restrictions, pushing companies toward safer, more limited releases. Sonnet 5's 0 percent score on risky tests is presented as a feature, but many in the community see it as capability sacrificed for compliance.

There is history here too. Earlier Opus versions, 4.7 especially, already drew heavy criticism for feeling like downgrades in real use. Sonnet 5 arriving as the budget Opus alternative, while some users report regressions against older Sonnets, only deepened the skepticism. At the same time, Chinese models are closing the gap fast on both price and performance, giving users more alternatives than ever.

It is not all bad news. Sonnet 5 is a strong pick for developers and teams doing agentic coding, multi-step tool use, or professional knowledge work who want Opus-level results without Opus-level spend, especially during the intro pricing window. The effort dial genuinely offers more control than previous Sonnets, and on pure benchmarks it is a clear upgrade path from 4.6.

The caution is for a specific crowd. Heavy power users who loved previous Sonnets for speed, low refusal rates, or creative and unconstrained work are the ones most likely to be let down. Many are sticking with 4.6 where it is still available, or trying other providers entirely.

The bottom line is a split verdict. Claude Sonnet 5 is a real technical improvement over 4.6 and reaches near-Opus performance on several metrics at lower cost, so on paper Anthropic delivered what it promised. But in the court of public opinion, especially among the developers and power users who drive model hype, it landed as a flop. Higher-than-expected token burn, visible safety constraints, and sky-high expectations in a brutally competitive 2026 AI market turned what should have been a quiet win into a PR headache.

Whether this is AI shrinkflation, regulatory reality, or simply a model that did not excite the base, one thing is clear. Sonnet 5 did not move the needle the way Anthropic, or its fans, hoped it would.

When did Claude Sonnet 5 release? June 30, 2026, as Anthropic's newest mid-tier model, positioned between everyday use and the flagship Opus 4.8.

How much does Claude Sonnet 5 cost? Introductory pricing is 2 dollars per million input tokens and 10 dollars per million output tokens through August 31, 2026, then 3 dollars and 15 dollars after that.

Is Sonnet 5 better than Sonnet 4.6? On benchmarks yes, with gains in agentic coding and reasoning, but some users report it feels slower or more restricted in everyday use.

What is the effort dial? A new setting from low to max that trades more tokens and cost for better results on hard tasks.

Why are people calling Sonnet 5 a flop? Higher token usage from a new tokenizer, heavier refusals and safety limits, and the feeling that the upgrade over 4.6 is too small to matter.

Read next