anthropic

Claude Sonnet 4.6

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

Agentic AIMultimodalCodingComputer UseLong Context
anthropic logoanthropicClaude 4February 17, 2026
Context
1.0Mtokens
Max Output
128Ktokens
Input Price
$3.00/ 1M
Output Price
$15.00/ 1M
Modality:TextImageAudio
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
89.9%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Claude Sonnet 4.6 scored 89.9% on this benchmark.
HLE
51%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Claude Sonnet 4.6 scored 51% on this benchmark.
MMLU
89.3%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Claude Sonnet 4.6 scored 89.3% on this benchmark.
MMLU Pro
79.2%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Claude Sonnet 4.6 scored 79.2% on this benchmark.
SimpleQA
45%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Claude Sonnet 4.6 scored 45% on this benchmark.
IFEval
95%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Claude Sonnet 4.6 scored 95% on this benchmark.
AIME 2025
94%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Claude Sonnet 4.6 scored 94% on this benchmark.
MATH
97.8%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Claude Sonnet 4.6 scored 97.8% on this benchmark.
GSM8k
99.1%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Claude Sonnet 4.6 scored 99.1% on this benchmark.
MGSM
98%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Claude Sonnet 4.6 scored 98% on this benchmark.
MathVista
70%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Claude Sonnet 4.6 scored 70% on this benchmark.
SWE-Bench
79.6%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Claude Sonnet 4.6 scored 79.6% on this benchmark.
HumanEval
98%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Claude Sonnet 4.6 scored 98% on this benchmark.
LiveCodeBench
80%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Claude Sonnet 4.6 scored 80% on this benchmark.
MMMU
83.6%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Claude Sonnet 4.6 scored 83.6% on this benchmark.
MMMU Pro
77%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Claude Sonnet 4.6 scored 77% on this benchmark.
ChartQA
92%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Claude Sonnet 4.6 scored 92% on this benchmark.
DocVQA
94%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Claude Sonnet 4.6 scored 94% on this benchmark.
Terminal-Bench
53%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Claude Sonnet 4.6 scored 53% on this benchmark.
ARC-AGI
58.3%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Claude Sonnet 4.6 scored 58.3% on this benchmark.

About Claude Sonnet 4.6

Learn about Claude Sonnet 4.6's capabilities, features, and how it can help you achieve better results.

High-Performance Agentic Intelligence

Claude Sonnet 4.6 is Anthropic's most versatile model, designed to act as a primary engine for complex enterprise workflows and autonomous agents. Released on February 17, 2026, it introduces human-level computer use capabilities and a 1-million-token context window. The model architecture balances the speed of mid-tier systems with the reasoning depth typically reserved for the Opus class, making it a sustainable choice for high-volume production environments.

Adaptive Thinking and Multimodality

At its technical core, Sonnet 4.6 utilizes an Adaptive Thinking mechanism. This allows developers to scale the internal reasoning effort based on the specific requirements of a task, optimizing for either sub-second latency or deep logical verification. The model is natively multimodal, offering state-of-the-art performance in processing text, high-resolution images, and audio files. It excels at interpreting dense technical documentation and complex visual data, such as architectural blueprints or financial charts.

The Industry Standard for Coding

With a record-breaking 79.6% on SWE-bench Verified, Sonnet 4.6 has become the default choice for software engineering automation. Its ability to reason across vast codebases within its 1M context window allows it to resolve multi-file bugs and plan architectural refactors with minimal human intervention. By offering near-Opus level intelligence at $3 per million input tokens, it removes the financial barriers previously associated with deploying truly autonomous AI systems.

Claude Sonnet 4.6

Use Cases

Discover the different ways you can use Claude Sonnet 4.6 to achieve great results.

Autonomous Software Engineering

Resolving complex multi-file GitHub issues and executing full-repository refactors using its 79.6% SWE-bench accuracy.

Human-Level Computer Use

Directly navigating desktop software and web interfaces to complete multi-step administrative tasks without custom API integrations.

Large-Scale Document Analysis

Reviewing thousands of pages of legal contracts or research papers simultaneously within the 1-million-token context window.

Financial Intelligence and Forecasting

Processing earnings calls and quarterly reports to identify subtle market anomalies using high-effort adaptive reasoning.

Multimodal Technical Support

Interpreting complex technical diagrams, circuit board photos, and audio recordings to provide precise troubleshooting steps.

Agentic Business Strategy

Planning and executing long-horizon operations by leveraging top-tier scores on strategy and logic-based benchmarks.

Strengths

Limitations

Elite Coding Accuracy: Sets the industry standard with 79.6% on SWE-bench Verified, outperforming all other mid-tier and most flagship models.
Lack of Native Video Input: Requires manual frame extraction for visual processing of video files, adding complexity to media workflows.
Unrivaled Context Capacity: The 1-million-token window allows for the ingestion of entire technical libraries or massive codebases without performance degradation.
Increased Reasoning Latency: Utilizing high-effort adaptive reasoning significantly increases the time-to-first-token compared to standard inference.
Autonomous Computer Use: Achieves a 72.5% score on OSWorld, enabling the model to navigate complex GUIs and software tools as a virtual operator.
High Output Reasoning Costs: While input pricing is competitive, max-effort reasoning tasks can consume large amounts of output tokens, increasing costs.
Optimized Price-Performance: Delivers near-Opus intelligence levels at 1/5th of the cost, making it the most economical choice for large-scale agent deployments.
Context Retrieval Noise: At the 1M token limit, the model can occasionally suffer from decreased focus if the context is filled with irrelevant data.

API Quick Start

anthropic/claude-sonnet-4-6

View Documentation
anthropic SDK
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const response = await anthropic.messages.create({
  model: "claude-4-sonnet-20260217",
  max_tokens: 4096,
  thinking: { type: "adaptive", effort: "high" },
  messages: [
    { role: "user", content: "Analyze this repository for architectural bottlenecks." }
  ],
});

console.log(response.content[0].text);

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Claude Sonnet 4.6

Context is noise. Bigger token windows are a trap. Give agents only the narrow, curated signal they need.
Logical-Storm-1180
reddit
This is Claude Sonnet 4.6: our most capable Sonnet model yet. It’s a full upgrade across coding, computer use, and agent planning.
Claude
twitter
The performance-to-cost ratio of Claude Sonnet 4.6 is extraordinary, it’s hard to overstate how fast these models are evolving.
Replit
youtube
Sonnet 4.6 is now live in Claude Code. It's cheaper than Opus 4.6 and nears Opus-level intelligence.
Boris Cherny
twitter
Claude 4.6 is the new leader in agentic performance, slightly ahead of Opus 4.6 on real-world knowledge work tasks.
Artificial Analysis
twitter
The fact that this model can navigate a computer interface with 72% accuracy basically ends the need for most bespoke APIs.
DevOpsGuru
hackernews

Related Videos

Watch tutorials, reviews, and discussions about Claude Sonnet 4.6

Sonic 4.6 is here and it may replace Opus for 90% of what you do daytoday.

But the best part, it's 40% cheaper than using Opus 4.6.

The SWE-bench results are actually unbelievable for a mid-tier model.

You can effectively feed it an entire codebase and it doesn't lose the plot.

Adaptive thinking effort allows you to trade off speed for deeper logic.

Early users are actually reporting that it is capable of near humanlike performance on complex spreadsheet manipulation.

This model is around twice as fast in comparison to the opus.

The 1 million token context window is currently in beta but works very well.

It navigates software interfaces without needing specific API integrations.

The coding capability on Python and JavaScript is basically at the ceiling.

Anthropic says the new context window is big enough to hold entire code bases and reason effectively across all that context.

Opus 4.6 is the nuclear bomb option... but now we finally have a scalpel which is awesome news.

Computer use is the standout feature here, actually moving the mouse and typing.

Financial analysts are going to love the reasoning depth for document review.

It's the first time a 'Sonnet' model has felt like the absolute best in class.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Claude Sonnet 4.6 and achieve better results.

Optimize Thinking Effort

Use the 'adaptive' thinking mode to save costs on simple queries while reserving 'max' effort for math and logic tasks.

Implement Context Compaction

Enable prompt caching and compaction features to handle the 1M token window efficiently without redundant costs.

Structured Behavioral Anchoring

Utilize a central project markdown file to maintain a persistent source of truth for the model's architectural decisions.

Video Frame Extraction

Since native video is not supported, extract key frames at 1fps for the most accurate visual analysis of video content.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

google

Gemini 3 Flash

Google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

1M context
$5.00/$25.00/1M
alibaba

Qwen3.5-397B-A17B

alibaba

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

1M context
$0.40/$2.40/1M
openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
google

Gemini 3 Pro

Google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

256K context
$0.60/$3.00/1M
xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M

Frequently Asked Questions

Find answers to common questions about Claude Sonnet 4.6