NEWS  /  Analysis

Anthropic Releases Claude Opus 4.7 as the New Benchmark for Autonomous AI Agents

By  商长君  Apr 22, 2026, 12:04 a.m. ET

Anthropic has launched Claude Opus 4.7, a major update focused on "agentic" task execution and high-resolution visual perception. The model significantly outperforms GPT-5.4 and Gemini 3.1 Pro in complex enterprise benchmarks, particularly in long-context document analysis and software engineering. While maintaining existing price tiers, the update introduces a new tokenizer that increases token consumption, signaling a shift toward more autonomous, albeit more resource-intensive, AI workflows.

NextFin News - Anthropic has officially released Claude Opus 4.7, a model the company describes as its most capable general-purpose AI to date, signaling a decisive shift in the industry from conversational fluency toward autonomous task execution. The launch, which occurred late Wednesday, introduces a model specifically engineered for "agentic" workflows—tasks that require the AI to operate independently over long durations with minimal human intervention. While the previous iteration, Opus 4.6, focused on reasoning depth, the 4.7 update prioritizes high-resolution visual perception and the stability of complex, multi-step software engineering chains.

The performance gap between Opus 4.7 and its primary competitors has widened significantly in enterprise-grade benchmarks. In the OfficeQA Pro evaluation, which requires parsing nearly 90,000 pages of historical U.S. Treasury documents, Opus 4.7 achieved a score of 80.6%. This result effectively doubles the performance of Google’s Gemini 3.1 Pro (42.9%) and substantially outpaces GPT-5.4 (51.1%). The data suggests that Anthropic has solved a critical bottleneck in long-context retrieval, particularly in the BFS 1M test, where the model’s ability to navigate massive datasets improved by over 17 percentage points compared to its predecessor.

Visual intelligence represents the most aggressive leap in this version. Opus 4.7 now supports image inputs of up to 2,576 pixels on the longest side, a threefold increase in resolution that allows the model to "see" UI elements as small as 0.07% of a screen. In the ScreenSpot-Pro benchmark, which tests an AI’s ability to locate specific buttons or data points within professional software like AutoCAD or Photoshop, Opus 4.7 reached an 87.6% success rate when paired with tool-calling capabilities. This precision is no longer a mere academic exercise; it is the prerequisite for "Computer Use," the ability for an AI to navigate a desktop environment like a human employee.

The economic implications of such precision are beginning to manifest in specialized fields. In structural biology, Opus 4.7’s reasoning scores jumped from 30.9% to 74.0% in a single version jump. This 2.4-fold increase in molecular reasoning suggests that the model is moving beyond general assistance into the realm of specialized scientific research. For the broader labor market, the model’s improved instruction-following means it is less likely to "hallucinate" or skip steps in a brief, though Anthropic warns that this rigidity may require users to rewrite older, more conversational prompts to avoid unexpected outputs.

Efficiency, however, comes with a literal cost. The new model utilizes a revised tokenizer that increases token consumption by approximately 10% to 35% for the same input volume. While Anthropic has maintained the pricing parity with the 4.5 and 4.6 tiers, the increased token density means that high-effort tasks will burn through API credits faster. This "token inflation" is the price of the model's increased "thinking" time, particularly when utilizing the new Xhigh Effort mode, which sits between standard processing and maximum reasoning depth.

U.S. President Trump’s administration has recently emphasized the need for American leadership in AI safety, a sentiment Anthropic has mirrored by launching Opus 4.7 alongside a 232-page system card. Notably, the company has intentionally throttled the model’s advanced cybersecurity capabilities, keeping them below the levels seen in its Mythos Preview model to prevent misuse in offensive operations. This cautious deployment strategy highlights a growing tension in the Valley: the race to build a model that can replace a human worker while ensuring that same model cannot dismantle a digital infrastructure. As spot gold prices trade at $4,790.625 per ounce today, reflecting a market hedge against technological and geopolitical volatility, the arrival of Opus 4.7 confirms that the "Agent Era" has moved from a Silicon Valley pitch deck into a functional, billable reality.

Please sign in and then enter your comment