anthropic

Claude Opus 4.6

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

ReasoningCodingMultimodalAgentic AIEnterprise
anthropic logoanthropicClaudeFebruary 5, 2026
Context
200Ktokens
Max Output
128Ktokens
Input Price
$5.00/ 1M
Output Price
$25.00/ 1M
Modality:TextImage
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
91%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Claude Opus 4.6 scored 91% on this benchmark.
HLE
53%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Claude Opus 4.6 scored 53% on this benchmark.
MMLU
91%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Claude Opus 4.6 scored 91% on this benchmark.
MMLU Pro
82%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Claude Opus 4.6 scored 82% on this benchmark.
SimpleQA
72%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Claude Opus 4.6 scored 72% on this benchmark.
IFEval
94%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Claude Opus 4.6 scored 94% on this benchmark.
AIME 2025
100%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Claude Opus 4.6 scored 100% on this benchmark.
MATH
93%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Claude Opus 4.6 scored 93% on this benchmark.
GSM8k
99%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Claude Opus 4.6 scored 99% on this benchmark.
MGSM
96%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Claude Opus 4.6 scored 96% on this benchmark.
MathVista
75%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Claude Opus 4.6 scored 75% on this benchmark.
SWE-Bench
81%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Claude Opus 4.6 scored 81% on this benchmark.
HumanEval
95%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Claude Opus 4.6 scored 95% on this benchmark.
LiveCodeBench
76%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Claude Opus 4.6 scored 76% on this benchmark.
MMMU
77%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Claude Opus 4.6 scored 77% on this benchmark.
MMMU Pro
77%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Claude Opus 4.6 scored 77% on this benchmark.
ChartQA
89%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Claude Opus 4.6 scored 89% on this benchmark.
DocVQA
93%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Claude Opus 4.6 scored 93% on this benchmark.
Terminal-Bench
65%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Claude Opus 4.6 scored 65% on this benchmark.
ARC-AGI
69%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Claude Opus 4.6 scored 69% on this benchmark.

About Claude Opus 4.6

Learn about Claude Opus 4.6's capabilities, features, and how it can help you achieve better results.

The New Frontier of Intelligence

Claude Opus 4.6 represents a significant leap in large language model capabilities, specifically engineered for the most demanding cognitive tasks. Released on February 5, 2026, it introduces Adaptive Thinking, a breakthrough feature that allows the model to dynamically scale its reasoning effort based on query complexity. This ensures that while simple queries remain efficient, complex logic puzzles and high-stakes engineering tasks receive the deep processing they require.

Built for the Agentic Era

Designed to go beyond simple chat, Opus 4.6 is a powerhouse for autonomous agentic workflows. With industry-leading scores on Terminal-Bench 2.0 and SWE-Bench Verified, it can navigate computer environments, manage multi-step software debugging, and orchestrate complex projects with minimal human intervention. Its expanded 1 million token context window (available in beta) allows it to hold entire technical ecosystems in memory simultaneously.

Use Cases for Claude Opus 4.6

Discover the different ways you can use Claude Opus 4.6 to achieve great results.

Autonomous Agent Workflows

Orchestrating multi-step agentic tasks across visual desktop environments using OSWorld-level reasoning.

Full-Stack Vibe Coding

Generating entire functional applications like 3D games or complex dashboards from a single high-level prompt.

Large-Scale Repo Management

Analyzing and refactoring massive codebases using the 1M token context window and Model Context Protocol.

Deep Scientific Research

Synthesizing PhD-level information across biology, chemistry, and physics with elite GPQA Diamond performance.

Expert Financial Analysis

Performing agentic financial modeling and multi-source data synthesis for enterprise-grade decision making.

Long-Horizon Planning

Managing complex, month-long projects or simulations requiring consistent tool usage and task adherence.

Strengths

Limitations

Elite Agentic Reasoning: State-of-the-art performance on Terminal-Bench 2.0 (65%) and OSWorld for autonomous agents.
Higher API Latency: When using maximum reasoning effort or massive context windows, the model can be significantly slower than Sonnet variants.
Massive Context Capacity: The 1M token window (beta) allows for the processing of entire libraries or large software repositories without loss of focus.
Premium Pricing Model: At $5/$25 per million tokens, it remains a high-cost option for developers compared to optimized flash or small models.
Dynamic Adaptive Thinking: The ability to scale reasoning effort ensure optimal performance for both quick queries and deep mathematical problems.
Integration Complexity: Features like Adaptive Thinking require updated API implementations and deeper knowledge of Anthropic's specific toolsets.
Superior Technical Mastery: Exceptional math and science capabilities, scoring a perfect 100% on AIME 2025 and 91% on GPQA Diamond.
Limited Multimodal Output: While vision input is world-class, the model currently lacks native real-time audio and video generation capabilities.

API Quick Start

anthropic/claude-opus-4-6

View Documentation
anthropic SDK
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env['ANTHROPIC_API_KEY'],
});

const message = await anthropic.messages.create({
  model: "claude-4-6-opus-20260205",
  max_tokens: 128000,
  messages: [
    { role: "user", content: "Create a fully functional 3D physics simulator using Three.js." }
  ],
});

console.log(message.content[0].text);

Install the SDK and start making API calls in minutes.

What People Are Saying About Claude Opus 4.6

See what the community thinks about Claude Opus 4.6

"Claude Opus 4.6 is shockingly powerful. Think Deep Research + advanced reasoning + serious coding ability."
Awa K. Penn
x
"Surpassing GPT-5.2 xhigh reasoning... huge jump from Opus 4.5’s 4.6% score!"
Minyang Tian
x
"This model is very strong for coding right now... doesn’t get lost in details."
Dinmukhanbet Aizharykov
x
"I've been using it for a week and the context retention is actually scary good."
CodeMaster99
reddit
"The adaptive thinking is a game changer for cost management on complex tasks."
AI_Strategy_Expert
hackernews
"Opus 4.6 is basically an AGI intern that actually listens to your feedback."
TechVlogger2026
youtube

Videos About Claude Opus 4.6

Watch tutorials, reviews, and discussions about Claude Opus 4.6

This model took the lead over every other frontier system out there... it's a different weight class entirely.

Think about a massive library of documents and the software actually remembers the footnote on page 400.

The model actually decides how hard it needs to work based on the difficulty... shifting gears.

It's the first time I've seen an AI really understand the 'vibe' of a complex engineering requirement.

This is clearly built for enterprise developers who need zero-shot accuracy over speed.

Claude has a new flagship model with Opus 4.6... Spoiler alert, it's just better than anything I've seen yet.

This model is just so much more autonomous than anything before... agentic power is real.

My personal feeling was that this is going to be Opus 5. That's how much I liked how it behaved.

It feels like they finally solved the 'drifting' issue in long conversations.

The adaptive thinking toggle is the most underrated feature of 2026.

It's Opus 4.6, which personally I'm more excited about cuz I always use the Opus models.

It gave me all these nice controls... This is the best result for this by far by a very large margin.

A single prompt... made a fully functioning game that I could see being like released on Steam.

The way it calls tools is so much more reliable now, it doesn't hallucinate arguments.

For heavy coding projects, this has officially replaced my previous setup entirely.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows
Watch demo video

Pro Tips for Claude Opus 4.6

Expert tips to help you get the most out of Claude Opus 4.6 and achieve better results.

Leverage Adaptive Thinking

Use the thinking parameter to toggle between effort levels to balance cost and cognitive depth for different tasks.

Context Compaction

For long-running agentic tasks, enable the beta context compaction feature to maintain performance without exceeding token limits.

Utilize MCP Tools

Pair Opus 4.6 with the Model Context Protocol to give the model secure access to local filesystems and databases.

One-Shot Complex Apps

Provide a comprehensive system prompt; Opus 4.6 is capable of generating 1,000+ line files accurately in one go.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

deepseek

DeepSeek-V3.2-Speciale

deepseek

DeepSeek-V3.2-Speciale is a reasoning-first LLM featuring gold-medal math performance, DeepSeek Sparse Attention, and a 131K context window. Rivaling GPT-5...

131K context
$0.28/$0.42/1M
google

Gemini 3 Flash

google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
google

Gemini 3 Pro

google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
moonshot

Kimi K2 Thinking

moonshot

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

256K context
$0.15/1M
openai

GPT-5.2

openai

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context
$1.75/$14.00/1M
openai

GPT-5.2 Pro

openai

GPT-5.2 Pro is OpenAI's 2025 flagship reasoning model featuring Extended Thinking for SOTA performance in mathematics, coding, and expert knowledge work.

400K context
$21.00/$168.00/1M
openai

GPT-5.1

openai

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
moonshot

Kimi K2.5

moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

262K context
$0.60/$2.50/1M

Frequently Asked Questions About Claude Opus 4.6

Find answers to common questions about Claude Opus 4.6