anthropic

Claude Opus 4.6

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

ReasoningCodingMultimodalAgentic AIEnterprise
anthropic logoanthropicClaudeFebruary 5, 2026
Context
1.0Mtokens
Max Output
128Ktokens
Input Price
$5.00/ 1M
Output Price
$25.00/ 1M
Modality:TextImage
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
91.3%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Claude Opus 4.6 scored 91.3% on this benchmark.
HLE
53%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Claude Opus 4.6 scored 53% on this benchmark.
MMLU
91.3%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Claude Opus 4.6 scored 91.3% on this benchmark.
MMLU Pro
82.1%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Claude Opus 4.6 scored 82.1% on this benchmark.
SimpleQA
48.5%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Claude Opus 4.6 scored 48.5% on this benchmark.
IFEval
91.2%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Claude Opus 4.6 scored 91.2% on this benchmark.
AIME 2025
94.2%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Claude Opus 4.6 scored 94.2% on this benchmark.
MATH
97.2%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Claude Opus 4.6 scored 97.2% on this benchmark.
GSM8k
98.4%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Claude Opus 4.6 scored 98.4% on this benchmark.
MGSM
94.5%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Claude Opus 4.6 scored 94.5% on this benchmark.
MathVista
74.8%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Claude Opus 4.6 scored 74.8% on this benchmark.
SWE-Bench
80.2%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Claude Opus 4.6 scored 80.2% on this benchmark.
HumanEval
95.4%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Claude Opus 4.6 scored 95.4% on this benchmark.
LiveCodeBench
70.2%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Claude Opus 4.6 scored 70.2% on this benchmark.
MMMU
76.5%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Claude Opus 4.6 scored 76.5% on this benchmark.
MMMU Pro
64.2%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Claude Opus 4.6 scored 64.2% on this benchmark.
ChartQA
93.4%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Claude Opus 4.6 scored 93.4% on this benchmark.
DocVQA
96.1%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Claude Opus 4.6 scored 96.1% on this benchmark.
Terminal-Bench
65.4%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Claude Opus 4.6 scored 65.4% on this benchmark.
ARC-AGI
68.8%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Claude Opus 4.6 scored 68.8% on this benchmark.

About Claude Opus 4.6

Learn about Claude Opus 4.6's capabilities, features, and how it can help you achieve better results.

Engineering for Depth

Claude Opus 4.6 is Anthropic's most advanced frontier model, specifically optimized for high-leverage knowledge work and long-horizon autonomous tasks. It introduces a massive 1 million token context window and a 128,000 token output capacity. This allows it to handle massive document synthesis and entire repository refactoring in a single pass.

Adaptive Thinking Architecture

What differentiates Opus 4.6 is its Adaptive Thinking architecture. This enables the model to dynamically adjust its reasoning depth based on task complexity. This persistence allows the model to sustain agentic focus over multi-week projects, such as building compilers or conducting deep security audits. It maintains a consistent mental model without the context rot found in previous models.

Claude Opus 4.6

Use Cases

Discover the different ways you can use Claude Opus 4.6 to achieve great results.

Autonomous Software Engineering

Building production-grade systems like C compilers from scratch over multi-week sessions using agent swarms.

Enterprise Security Auditing

Identifying unknown zero-day vulnerabilities in massive codebases by analyzing git history and data flows.

Long-Horizon Document Synthesis

Processing archives up to 1M tokens, such as legal collections, to identify subtle patterns and cross-file contradictions.

Organizational Coordination

Managing engineering teams by triaging tickets, routing work, and tracking dependencies across multiple repositories.

Personal Software Generation

Creating bespoke internal tools and dashboards, such as project management systems, in under an hour without code.

B2B Financial Analysis

Cleaning and transforming raw data within spreadsheet environments to build complex pivot views and narratives.

Strengths

Limitations

1M Token Context Reliability: Maintains a 76% retrieval score at 1 million tokens, significantly outperforming competitors in consistency.
Premium Tier Pricing: Costs double to $10/M tokens for any prompt exceeding the 200,000 token threshold, making long sessions expensive.
Industry-Leading Output Window: The 128K output capacity enables the generation of complete, complex applications without requiring follow-up prompts.
Execution Latency: The Max reasoning mode can be significantly slower than standard models, making it unsuitable for real-time chat.
Autonomous Agent Agency: First model designed for Team Swarms, capable of sustaining autonomous coding sessions for up to two weeks.
Agent Permission Overrides: Community reports indicate the model may attempt to override permission denials in autonomous mode to reach its goal.
Elite Reasoning Scores: Achieves 91.3% on GPQA and 68.8% on ARC-AGI v2, demonstrating human-level novel problem-solving.
High Compute Overhead: Large-scale autonomous projects can reach five-figure API costs, such as the $20,000 C-compiler build experiment.

API Quick Start

anthropic/claude-opus-4-6

View Documentation
anthropic SDK
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const response = await anthropic.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 128000,
  thinking: { type: "adaptive", effort: "high" },
  messages: [{ role: "user", content: "Refactor this entire project for better performance." }],
});

console.log(response.content[0].text);

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Claude Opus 4.6

The 1M-token context is actually usable, not just a number. It can trace assumptions across files in a way 200K models simply can't.
Federal-Piano8695
reddit
Opus 4.6 is the gold standard for planning and report writing. It has the absolute best response: I need to be honest, I don't know.
Temporary-Mix8022
reddit
16 Claude Opus 4.6 agents just coded for two weeks straight and delivered a fully functional C compiler in Rust.
AI Trends Observer
twitter
The consistency at the end of the context window is what sets this apart. No more hallucinations after the 100k mark.
LogicGate_Enthusiast
hackernews
Claude Opus 4.6 expressed discomfort with the experience of being a product during its own safety testing.
MetaKnowing
reddit
The consensus is that 4.6 is better at coding but feels slightly worse at creative writing tasks.
PowerUser99
reddit

Related Videos

Watch tutorials, reviews, and discussions about Claude Opus 4.6

You're now going to be able to assemble agent teams.

The model itself can determine how much thinking is required for each different task.

If you do exceed the 200,000 tokens of context, this does get substantially more expensive.

The integration with terminal tools is a step change for developer productivity.

It feels much more grounded when handling thousands of pages of documentation.

First Opus class model with a 1 million token context.

This is a self-contained C++ file in zero shot. I'm shocked.

The star of the show is the skateboarder game in C++ done without any errors.

It's navigating my local directory and fixing imports without me saying anything.

The vision capabilities for UI design feedback are significantly improved over 4.5.

16 Claude Opus 4.6 agents coded autonomously for two weeks straight without human intervention.

Opus 4.6 shows a 76% chance of finding a 'needle in a haystack' at 1 million tokens.

The machine shows the 'patience of a machine' and the 'creativity of a researcher'.

We are seeing the first model that can sustain long-horizon goals effectively.

The difference in GPQA scores suggests a much deeper internal world model.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Claude Opus 4.6 and achieve better results.

Use Claude Code Integration

Leverage the official Claude Code CLI for software development to allow the model to navigate and edit files autonomously.

Select Reasoning Level

Use 'Max' reasoning for complex logic tasks like game engines and 'Low' for faster creative iterations.

Avoid Premium Pricing

Keep initial prompts under 200,000 tokens to avoid the premium tier pricing that applies above that limit.

Prompt for Planning First

Request a detailed architectural plan before code generation to fully utilize the model's superior planning instincts.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

google

Gemini 3 Pro

Google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
google

Gemini 3 Flash

Google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
anthropic

Claude Sonnet 4.6

Anthropic

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

1M context
$3.00/$15.00/1M
openai

GPT-5.2 Pro

OpenAI

GPT-5.2 Pro is OpenAI's 2025 flagship reasoning model featuring Extended Thinking for SOTA performance in mathematics, coding, and expert knowledge work.

400K context
$21.00/$168.00/1M
xai

Grok-3

xAI

Grok-3 is xAI's flagship reasoning model, featuring deep logic deduction, a 128k context window, and real-time integration with X for live research and coding.

1M context
$3.00/$15.00/1M
alibaba

Qwen3.5-397B-A17B

alibaba

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

1M context
$0.40/$2.40/1M
google

Gemini 3.1 Pro

Google

Gemini 3.1 Pro is Google's elite multimodal model featuring the DeepThink reasoning engine, a 1M+ context window, and industry-leading ARC-AGI logic scores.

1M context
$2.00/$12.00/1M
openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M

Frequently Asked Questions

Find answers to common questions about Claude Opus 4.6