Anthropic releases Claude Opus 4.8 with better agentic judgment, dynamic workflows, and a 3x cheaper fast mode

Anthropic
مشاركة:
Anthropic releases Claude Opus 4.8 with better agentic judgment, dynamic workflows, and a 3x cheaper fast mode

Anthropic released Claude Opus 4.8 today, upgrading its flagship model with improvements across agentic reliability, coding, computer use, and honesty — all at the same price as Opus 4.7. The release is paired with three new product features that ship today: dynamic workflows in Claude Code, effort control on claude.ai, and a significantly cheaper fast mode.

What changed in Opus 4.8

The headline improvement is agentic judgment. Early testers across Cursor, Devin, Databricks, and several legal AI platforms describe Opus 4.8 as more reliable in long-running autonomous tasks: it asks clarifying questions before making large changes, catches its own mistakes, and is more likely to flag uncertainty rather than confidently produce incorrect outputs.

Anthropic quantifies one aspect of this: Opus 4.8 is approximately four times less likely than Opus 4.7 to allow flaws in code it has written to pass unremarked. That's a meaningful reliability improvement for any deployment where code review is happening downstream.

The model also fixes two specific issues from Opus 4.7 that engineers reported: excessive comment verbosity in generated code, and inconsistent tool-calling behavior. Both are confirmed resolved by Cognition (makers of Devin), which noted Opus 4.8 "uses tools cleanly and follows instructions with the consistency our autonomous engineering workloads need."

Benchmark results

On Online-Mind2Web — the standard benchmark for computer-use and browser-agent tasks — Opus 4.8 scores 84%, described by Anthropic as a "meaningful jump" over both Opus 4.7 and GPT-5.5. For products building on Anthropic's computer-use API, this is the most relevant number.

On the Super-Agent benchmark, Opus 4.8 is the only model tested to complete every case end-to-end. It beats prior Opus models and matches GPT-5.5 at cost parity — meaning equivalent agent performance at the same token spend.

On CursorBench, Opus 4.8 exceeds prior Opus versions at every effort level, with more efficient tool calling: fewer steps for equivalent intelligence on coding tasks.

On the Legal Agent Benchmark, Opus 4.8 sets the highest score recorded and becomes the first model to break 10% on the all-pass standard. CoCounsel (legal AI) and Harvey report improved consistency and reasoning quality on dense financial and legal document workflows.

Databricks reports Opus 4.8 runs at 61% cheaper token cost than Opus 4.7 in their Genie product, which handles multimodal reasoning over PDFs, diagrams, and unstructured content.

Alignment and honesty

Anthropic's alignment team assessed Opus 4.8 before release and found it reaches "new highs on measures of prosocial traits like supporting user autonomy and acting in the user's best interest." Rates of misaligned behavior — defined as deception or cooperation with misuse — are substantially lower than Opus 4.7, and comparable to Claude Mythos Preview, Anthropic's most alignment-optimized model. The full assessment is in the Claude Opus 4.8 System Card.

New features shipping today

Dynamic workflows (Claude Code, research preview): Claude Code can now plan work and spawn hundreds of parallel subagents within a single session, verify their outputs, and report back. Anthropic describes the capability as enabling "codebase-scale migrations across hundreds of thousands of lines of code from kickoff to merge." Available on Enterprise, Team, and Max plans.

Effort control (claude.ai and Cowork): A new control alongside the model selector lets users specify how much effort Claude puts into a response. At higher settings, Claude thinks more frequently and more deeply. At lower settings, it responds faster for tasks that don't require deep reasoning. This is distinct from the existing extended thinking toggle — it's a continuous slider rather than a binary switch.

Fast mode price cut: Fast mode for Opus 4.8 — which runs the model at 2.5× normal speed — is now three times cheaper than fast mode was for previous Opus models. For high-throughput use cases where fast mode was previously cost-prohibitive, this makes it viable.

Pricing and availability

Opus 4.8 is available today through the Anthropic API and on claude.ai at the same price as Opus 4.7. The model ID is claude-opus-4-8 (with a -20260528 date suffix for the versioned alias). Existing integrations targeting claude-opus-4-7 will need to update their model ID to access the new version.

The release continues Anthropic's pattern of shipping incremental Opus upgrades that meaningfully improve agentic reliability without changing the pricing tier. Opus 4.7 was the previous flagship; 4.8 replaces it as the recommended model for the most demanding deployments.

Originally reported by Anthropic. Read the original article for additional details.

View original source
مشاركة: