Claude Opus 4.7: The Most Powerful AI Tool You Can Access — and the Questions It Raises for Business Leaders

Johan Steyn
May 8
6 min read

Anthropic's latest model is a genuine leap forward in agentic capability — but its hidden costs, selective improvements, and the existence of a more powerful model the public cannot access tell a more complicated story.

Video summary: https://youtu.be/1mkf9nfdirg

Sign up for my Substack daily AI newsletter here.

See my AI Training course portfolio for corporate Business Leaders here.

Article link: https://open.substack.com/pub/johanosteyn/p/claude-opus-47-the-most-powerful?r=73gqa&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Follow me on LinkedIn: https://www.linkedin.com/in/johanosteyn/

On 16 April 2026, Anthropic released Claude Opus 4.7 — its most capable publicly available AI model to date. The release arrived in a charged context. On 2 April 2026, Stella Laurenzo, Senior Director of AMD's AI group, filed a detailed GitHub issue backed by a forensic analysis of 6,852 Claude Code session files, 17,871 thinking blocks, and 234,760 tool calls spanning January through March 2026. Her conclusion was unambiguous: Claude had regressed to the point it could no longer be trusted for complex engineering work.

As The Register reported, a every senior engineer on her team had reported similar experiences, and Laurenzo warned that Anthropic was far from alone at the capability tier that Opus had previously occupied. As VentureBeat's investigation into the controversy found, Anthropic denied that any changes were made to redirect computing resources to other projects, with the Claude Code lead disputing the main conclusion of the analysis while acknowledging that real product changes had taken place. Opus 4.7 is, in part, a reputational reset — and for business leaders making decisions about AI tools, platforms, and governance, this story is more than a product update. It is a case study in the complexity, opportunity, and risk of depending on AI systems that evolve — and sometimes deteriorate — without notice.

CONTEXT AND BACKGROUND

Claude Opus 4.7 is an incremental but meaningful upgrade on its predecessor, available across Anthropic’s Claude products, the API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot, at the same price as Opus 4.6. As Axios reported on the release, Anthropic described it as a notable improvement on advanced software engineering, with particular gains on the most difficult tasks — with users now able to hand off their hardest coding work with confidence, and the model offering substantially better vision at higher resolution.

The benchmark numbers support the headline claims. According to VentureBeat’s detailed analysis of the release, Opus 4.7 currently leads the market on the GDPVal-AA knowledge work evaluation with an Elo score of 1,753 — surpassing both GPT-5.4 and Gemini 3.1 Pro. On SWE-bench Verified, the industry standard test for agentic coding, the model jumped from 80.8 per cent to 87.6 per cent. On vision tasks, the improvement is even more striking — image resolution support tripled from 1.15 megapixels to 3.75 megapixels, producing a jump in visual navigation performance from 57.7 per cent to 79.5 per cent.

INSIGHT AND ANALYSIS

For business leaders, four features of Opus 4.7 deserve particular attention. The first is the improvement in agentic reliability — the model’s ability to complete long, multi-step tasks without losing context, skipping steps, or requiring constant human correction. Anthropic introduced what it calls self-verification, where the model builds its own checks before reporting a task as complete. Real-world partners confirmed this in practice: Rakuten measured three times more production tasks resolved compared to Opus 4.6, and Cursor reported an improvement in coding performance from 58 to 70 per cent on its internal benchmark.

The second is the introduction of task budgets — a feature allowing developers and organisations to set hard spending limits on autonomous AI agent runs. As Anthropic's own API documentation explains, task budgets let developers tell Claude how many tokens it has for a full agentic loop — covering thinking, tool calls, tool results, and output. The model sees a running countdown and uses it to prioritise work and finish gracefully as the budget is consumed, rather than cutting off mid-task or running up an unexpected bill. As VentureBeat noted in its analysis of the release, this allows developers to set a hard ceiling on token spend for autonomous agents, ensuring that a long-running debugging session does not result in a surprise invoice. For CFOs and finance teams already scrutinising AI infrastructure costs, this is a meaningful governance capability that has received insufficient attention in the coverage of the release.

The third is the new xhigh effort level. As the AI Corner’s comprehensive guide to the release explains, xhigh — extra high — is a new setting exclusive to Opus 4.7, sitting between high and max, and is now the default in Claude Code for all plans. According to independent benchmark data compiled by LLM Stats in its head-to-head analysis of Opus 4.7 versus Opus 4.6, xhigh scoring approaches 71 per cent on complex coding tasks at around 100,000 tokens — already ahead of where Opus 4.6’s maximum effort sits at 200,000 tokens. This matters for enterprise use because it resolves a practical problem that previously had no clean solution: organisations deploying AI at scale were forced to choose between high effort, which could underperform on genuinely complex tasks, and maximum effort, which delivers the best results but at significantly higher token cost and latency. The xhigh setting provides a governed middle ground — and for finance teams managing AI infrastructure budgets at scale, that calibration capability is considerably more consequential than it might initially appear.

The fourth — and most important for leaders thinking about AI governance — is what this release reveals about model quality consistency. The Opus 4.6 regression that preceded this release was not announced. Users discovered it through degraded outputs. Anthropic denied deliberate changes. Whatever the cause, the episode illustrates a governance gap that most organisations have not addressed: there is no standard mechanism for detecting when an AI model you depend on has quietly gotten worse. As Build Fast with AI’s detailed review of the release notes, this release comes directly after weeks of public criticism — meaning Opus 4.7 is partly a reputational reset, not just a capability upgrade.

IMPLICATIONS

The potential problems with Opus 4.7 deserve equal attention to its strengths. The first is the tokeniser change. As NxCode documents, Opus 4.7 ships with an updated tokeniser that produces between 1.0 and 1.35 times more tokens than Opus 4.6 for the same content. The per-token price is unchanged — but a 35 per cent increase in token count is effectively a 35 per cent cost increase for affected workloads. Organisations running high-volume AI applications need to benchmark this before migrating.

The second problem is the model’s more literal instruction-following. Opus 4.7 takes instructions precisely and at face value — which sounds like a benefit until organisations discover that prompts designed for Opus 4.6’s looser interpretation now produce unexpected results. Anthropic explicitly flags this as a migration concern. The third is a significant regression on long-context performance. Independent testing documented by Apiyi found that the model’s multi-round context recall benchmark dropped from 78.3 per cent to 32.2 per cent — a structural regression that explains developer complaints about the model failing to retain information from long documents. For organisations using AI to analyse lengthy contracts, reports, or policy documents, this is a material limitation.

Finally, there is the two-tier problem that sits above all of these. Anthropic has simultaneously released a publicly available model it openly describes as less capable than Claude Mythos Preview — a more powerful model withheld from general access due to cybersecurity concerns and available only to a selected group of organisations, including Apple, Microsoft, Amazon, Google, JPMorgan Chase and Nvidia. The existence of a better model that most businesses cannot access is a structural feature of the current AI market that business leaders need to factor into their platform strategies.

CLOSING TAKEAWAY

Claude Opus 4.7 is a genuine and meaningful improvement — particularly for organisations doing complex, multi-step agentic work, software engineering, and visual document analysis. For South African business leaders evaluating AI tools, it represents the current frontier of what is publicly accessible. But the story of this release — the complaints that preceded it, the hidden cost implications of the tokenizer change, the long-context regression that benchmarks do not advertise, and the existence of a superior model that organisations cannot access — is also a story about the governance complexity of depending on AI systems that change without warning, cost more than they initially appear to, and exist within a commercial ecosystem where the most capable tools are not available to everyone. That is not a reason to avoid Claude Opus 4.7. It is a reason to approach it, and every AI tool like it, with clear eyes, proper measurement, and the governance frameworks that the pace of AI development now demands.

Author Bio: Johan Steyn is a prominent AI thought leader, speaker, and author with a deep understanding of artificial intelligence’s impact on business and society. He is passionate about ethical AI development and its role in shaping a better future. Find out more about Johan’s work at https://www.aiforbusiness.net

Brainstorm magazine November 2019

Claude Opus 4.7: The Most Powerful AI Tool You Can Access — and the Questions It Raises for Business Leaders

Recent Posts

Comments