anthropic

Claude Sonnet 4.6

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

Agentic AIMultimodalCodingComputer UseLong Context
anthropic logoanthropicClaudeFebruary 17, 2026
Context
1.0Mtokens
Max Output
64Ktokens
Input Price
$3.00/ 1M
Output Price
$15.00/ 1M
Modality:TextImageAudioVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
89.9%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Claude Sonnet 4.6 scored 89.9% on this benchmark.
HLE
49%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Claude Sonnet 4.6 scored 49% on this benchmark.
MMLU
89.3%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Claude Sonnet 4.6 scored 89.3% on this benchmark.
MMLU Pro
79.2%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Claude Sonnet 4.6 scored 79.2% on this benchmark.
SimpleQA
48.5%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Claude Sonnet 4.6 scored 48.5% on this benchmark.
IFEval
89.5%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Claude Sonnet 4.6 scored 89.5% on this benchmark.
AIME 2025
83%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Claude Sonnet 4.6 scored 83% on this benchmark.
MATH
85.3%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Claude Sonnet 4.6 scored 85.3% on this benchmark.
GSM8k
96.4%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Claude Sonnet 4.6 scored 96.4% on this benchmark.
MGSM
92.8%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Claude Sonnet 4.6 scored 92.8% on this benchmark.
MathVista
68.7%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Claude Sonnet 4.6 scored 68.7% on this benchmark.
SWE-Bench
79.6%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Claude Sonnet 4.6 scored 79.6% on this benchmark.
HumanEval
92.1%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Claude Sonnet 4.6 scored 92.1% on this benchmark.
LiveCodeBench
72.4%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Claude Sonnet 4.6 scored 72.4% on this benchmark.
MMMU
74.2%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Claude Sonnet 4.6 scored 74.2% on this benchmark.
MMMU Pro
75.6%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Claude Sonnet 4.6 scored 75.6% on this benchmark.
ChartQA
88.1%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Claude Sonnet 4.6 scored 88.1% on this benchmark.
DocVQA
93.4%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Claude Sonnet 4.6 scored 93.4% on this benchmark.
Terminal-Bench
59.1%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Claude Sonnet 4.6 scored 59.1% on this benchmark.
ARC-AGI
58.3%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Claude Sonnet 4.6 scored 58.3% on this benchmark.

About Claude Sonnet 4.6

Learn about Claude Sonnet 4.6's capabilities, features, and how it can help you achieve better results.

A Generational Leap in Intelligence

Claude Sonnet 4.6 is Anthropic's most capable and versatile model to date, designed to serve as a high-performance workhorse for complex enterprise and developer workflows. Released on February 17, 2026, it represents a major generational leap over the 4.5 series, introducing human-level computer use capabilities and a massive 1-million-token context window in beta. The model is optimized for agentic tasks, meaning it doesn't just process text but can autonomously plan and execute multi-step operations across various software environments.

Technical Sophistication and Multimodality

Technically, Sonnet 4.6 bridges the gap between the speed of mid-tier models and the deep reasoning of the Opus class. It features Adaptive Thinking, allowing it to scale its internal reasoning effort based on task complexity. This model has become the new default for Claude Free and Pro users, offering flagship-level intelligence in coding, financial analysis, and document comprehension. It is truly native multimodal, supporting text, image, audio, and video inputs to process a variety of media processing tasks with state-of-the-art accuracy.

The New Industry Standard for Agents

With its elite performance-to-cost ratio, Sonnet 4.6 is positioned as the primary engine for AI agents. It achieves industry-leading scores on SWE-bench Verified (79.6%) and OSWorld-Verified (72.5%), demonstrating its superior ability to navigate real-world software engineering issues and complex operating system tasks. By providing near-Opus intelligence at a fraction of the cost, it empowers developers to build autonomous systems that were previously computationally or financially prohibitive.

Claude Sonnet 4.6

Use Cases for Claude Sonnet 4.6

Discover the different ways you can use Claude Sonnet 4.6 to achieve great results.

Autonomous Software Engineering

Using Claude Code to refactor entire repositories and implement complex features with repository-wide context.

Human-Level Computer Use

Automating legacy software and web workflows by seeing the screen and interacting via virtual mouse and keyboard.

Financial Document Comprehension

Analyzing thousands of pages of filings and tables to reason through complex investment strategies or risks.

Real-Time Business Simulation

Running agentic simulations where the model manages a virtual business and optimizes for profitability.

Multilingual Technical Writing

Generating technical documentation across dozens of languages while maintaining perfect architectural spec compliance.

Frontend UI/UX Generation

Creating polished, modern dashboard interfaces with a focus on typography, color theory, and responsive layout.

Strengths

Limitations

Industry-Leading Coding: Achieves a state-of-the-art 79.6% on SWE-bench Verified, outperforming competitors in resolving real GitHub issues.
Latency in Thinking Mode: High thinking token budgets increase time-to-first-token, making it less ideal for instant real-time chat.
Elite Performance-to-Cost Ratio: Delivers near-Opus intelligence levels at 5x lower cost, making it the most economical choice for large automation.
Rate Limiting Friction: Free and Pro users hit aggressive message caps during intense sessions, necessitating shifts to the API.
Human-Level Computer Navigation: Scores 72.5% on OSWorld-Verified, showing massive improvement in navigating complex software without APIs.
Context Decay Above 150k: Despite the 1M window, the model can still occasionally lose specific details from the middle of very large prompts.
Adaptive Reasoning Power: Features a scalable Thinking mode that allows developers to scale reasoning effort up for hard logic problems.
Prompt Injection Vulnerability: The Computer Use feature poses risks where malicious websites could attempt to hijack the model's virtual browser session.

API Quick Start

anthropic/claude-sonnet-4-6

View Documentation
anthropic SDK
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const msg = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Analyze this codebase for security vulnerabilities.' }
  ],
});

console.log(msg.content);

Install the SDK and start making API calls in minutes.

What People Are Saying About Claude Sonnet 4.6

See what the community thinks about Claude Sonnet 4.6

Claude Sonnet 4.6 hit 94% on our insurance benchmark, making it the highest-performing model we've tested for computer use.
Swami Sivasubramanian
twitter
The hype is real, this is hands down the best (and most fun) LLM I've ever used! Head and shoulders above what I've seen so far.
WolframRavenwolf
reddit
Claude Sonnet 4.6 is hilarious, not just 'funny at times'. Broadly warm, honest, and prosocial.
Anton P.
twitter
Sonnet 4.6 is so much better than Cline for coding tasks. I'm not even opening files manually anymore.
semibaron
hackernews
The 1M context window is a game changer for codebase migration. I just uploaded my whole legacy stack.
DevOpsDan
reddit
It handles complex spreadsheets and web forms with almost eerie precision. The Computer Use beta is finally ready.
AI_Insights_Daily
youtube

Videos About Claude Sonnet 4.6

Watch tutorials, reviews, and discussions about Claude Sonnet 4.6

Sonnet 4.6 scored a higher score on GDP val, which measures real world meaningful tasks.

It is becoming more difficult to even know if these models are capable of CBRN things.

The speed to intelligence ratio here is essentially unmatched by any other model on the market.

Anthropic is clearly focusing on the agentic side of the house with this release.

The cost structure makes this the new default for any high-volume API developer.

This model is around twice as fast in comparison to the opus model from last month.

In conclusion, guys, this model is the best bang for your buck for enterprise coding.

The vision capabilities for interpreting complex architecture diagrams are significantly improved.

I was able to give it 50 files and it refactored the entire routing logic perfectly.

It feels much more human in its communication style compared to GPT-4o.

It's actually beating out Opus 4.6 in some areas while coming in at a 40% cheaper price point.

As we fill up the context window, once we hit about 150,000 tokens, the effectiveness tends to drop.

The adaptive reasoning feature allows you to basically toggle between speed and deep logic.

This release feels like the first true 'agent-first' model from Anthropic.

I would use this for everything except for maybe the absolute highest level creative writing.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips for Claude Sonnet 4.6

Expert tips to help you get the most out of Claude Sonnet 4.6 and achieve better results.

Leverage Context Compaction

Enable the Compaction feature in the API to automatically summarize older conversation history for long sessions.

Use Thinking Tokens Strategically

For math or complex logic, set a higher budget for thinking tokens to let the model explore multiple reasoning paths.

Prompt for SPEC Compliance

Explicitly ask the model to follow modern architectural best practices, as it naturally reaches for updated tools.

Utilize Artifacts for UI

Encourage the model to use UI Artifacts to separate code generations from the chat thread for real-time iteration.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

google

Gemini 3 Flash

Google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

200K context
$5.00/$25.00/1M
alibaba

Qwen3.5-397B-A17B

alibaba

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

1M context
$0.60/$3.60/1M
deepseek

DeepSeek-V3.2-Speciale

DeepSeek

DeepSeek-V3.2-Speciale is a reasoning-first LLM featuring gold-medal math performance, DeepSeek Sparse Attention, and a 131K context window. Rivaling GPT-5...

131K context
$0.28/$0.42/1M
openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
google

Gemini 3 Pro

Google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

262K context
$0.60/$2.50/1M
xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M

Frequently Asked Questions About Claude Sonnet 4.6

Find answers to common questions about Claude Sonnet 4.6