anthropic

Claude 4.5 Sonnet

Anthropic's Claude Sonnet 4.5 delivers world-leading coding (77.2% SWE-bench) and a 200K context window, optimized for the next generation of autonomous agents.

AI CodingAgentic AIHybrid ReasoningAnthropicMultimodal
anthropic logoanthropicClaudeSeptember 29, 2025
Context
200Ktokens
Max Output
64Ktokens
Input Price
$3.00/ 1M
Output Price
$15.00/ 1M
Modality:TextImage
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
78%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Claude 4.5 Sonnet scored 78% on this benchmark.
HLE
40%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Claude 4.5 Sonnet scored 40% on this benchmark.
MMLU
91%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Claude 4.5 Sonnet scored 91% on this benchmark.
MMLU Pro
83%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Claude 4.5 Sonnet scored 83% on this benchmark.
SimpleQA
52%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Claude 4.5 Sonnet scored 52% on this benchmark.
IFEval
88%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Claude 4.5 Sonnet scored 88% on this benchmark.
AIME 2025
84%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Claude 4.5 Sonnet scored 84% on this benchmark.
MATH
91%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Claude 4.5 Sonnet scored 91% on this benchmark.
GSM8k
99%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Claude 4.5 Sonnet scored 99% on this benchmark.
MGSM
95%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Claude 4.5 Sonnet scored 95% on this benchmark.
MathVista
73%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Claude 4.5 Sonnet scored 73% on this benchmark.
SWE-Bench
77%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Claude 4.5 Sonnet scored 77% on this benchmark.
HumanEval
94%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Claude 4.5 Sonnet scored 94% on this benchmark.
LiveCodeBench
68%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Claude 4.5 Sonnet scored 68% on this benchmark.
MMMU
78%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Claude 4.5 Sonnet scored 78% on this benchmark.
MMMU Pro
55%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Claude 4.5 Sonnet scored 55% on this benchmark.
ChartQA
89%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Claude 4.5 Sonnet scored 89% on this benchmark.
DocVQA
92%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Claude 4.5 Sonnet scored 92% on this benchmark.
Terminal-Bench
58%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Claude 4.5 Sonnet scored 58% on this benchmark.
ARC-AGI
77%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Claude 4.5 Sonnet scored 77% on this benchmark.

About Claude 4.5 Sonnet

Learn about Claude 4.5 Sonnet's capabilities, features, and how it can help you achieve better results.

**The Frontier of Agentic Intelligence**

Claude 4.5 Sonnet represents a major advancement in frontier intelligence, optimized for the era of autonomous AI agents. Released in late 2025, it is a hybrid reasoning model that allows developers to toggle between high-speed execution for routine tasks and extended thinking for complex logical challenges. It leads benchmarks in computer use and tool orchestration, making it a preferred engine for terminal-based agents and multi-file software engineering.

**Precision and Reduced Hallucinations**

The model architecture prioritizes logic and precision, reducing the sycophancy and hallucinations observed in earlier series. With a 64,000-token output limit and a 200,000-token input window, it can process entire repositories while generating full application files in a single pass. It introduces native checkpoints for agentic workflows, allowing systems to roll back and correct mistakes autonomously without human intervention.

**Multimodal and Reasoning Prowess**

Beyond software development, Sonnet 4.5 excels in multimodal document analysis and financial modeling. Its internal logic prioritizes architectural context, enabling it to map out large-scale systems more effectively than predecessors. Whether processing handwritten notes or implementing API integrations, the model maintains high factual accuracy and strict instruction following across long-horizon tasks.

Claude 4.5 Sonnet

Use Cases

Discover the different ways you can use Claude 4.5 Sonnet to achieve great results.

Autonomous Software Engineering

Managing end-to-end development from initial requirements to automated commits using terminal interfaces.

GUI-Based Automation

Automating web browsing and data entry into legacy applications using native computer use capabilities.

Multi-Agent Orchestration

Delegating specialized tasks to sub-agents like reviewers and builders within a central planning loop.

Complex Code Refactoring

Re-architecting multi-file codebases while maintaining consistency across 200,000 tokens of active context.

Nuanced Financial Analysis

Analyzing quarterly reports and spreadsheets with vision to identify discrepancies and investment insights.

Interactive Data Visualization

Generating dynamic charts from complex datasets using embedded code execution and real-time building.

Strengths

Limitations

Native Computer Use: The model interacts with operating systems via cursor movement and GUI manipulation at 61.4 percent accuracy.
No Native Audio Input: The model cannot directly process audio files as a native modality and requires external transcription tools.
Elite Coding Performance: It achieves 77.2 percent on SWE-bench Verified, leading all other models in resolving GitHub issues.
Reasoning Token Cost: Tokens used during internal extended thinking are billed as output tokens, increasing the cost for complex queries.
30-Hour Task Horizon: The architecture allows for 30 hours of continuous autonomous work while maintaining state and focus.
Latency in Thinking Mode: When extended thinking is enabled, the model can take several minutes to process complex architectural plans.
64K Output Limit: Massive output capacity enables generating entire application architectures in a single API call.
Competitive Math Variance: While leading in coding, it occasionally trails specialized reasoning models in specific competitive programming benchmarks.

API Quick Start

anthropic/claude-4-5-sonnet

View Documentation
anthropic SDK
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const response = await anthropic.messages.create({
  model: "claude-4-5-sonnet-20250929",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Analyze this codebase for security flaws." }
  ],
});

console.log(response.content[0].text);

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Claude 4.5 Sonnet

Claude 4.5 Sonnet is available everywhere today, the best coding model in the world.
ClaudeOfficial
reddit
This fixes one of the most painful scaling issues with MCP setups. Was watching context evaporate before any actual work began.
Simon Willison
twitter
Claude Code-Sonnet 4.5 is way ahead of Gemini 3.0 Pro for complex Dockerized refactoring tasks.
Comfortable-Friend96
reddit
The pattern: Mistakes become documentation. You add a rule to CLAUDE.md and it never happens again.
Boris Cherny
twitter
The hybrid reasoning mode is a lifesaver for debugging complex async logic where regular models just loop.
AsyncDev
hackernews
Pricing parity with 3.5 Sonnet makes this an easy upgrade for all our production agent pipelines.
StartupFounder2025
reddit

Related Videos

Watch tutorials, reviews, and discussions about Claude 4.5 Sonnet

This new 4.5 Sonnet model is outperforming even Opus 4.1 on the Swaybench verified test

It was able to maintain focus for over 30 hours on complex multi-step tasks

It leads on the OS world computer use benchmark with a score of 61.4 percentage

The internal reasoning engine handles Python environments with far more stability than 3.5

Terminal integration feels much tighter with almost zero hallucinated shell commands

Sonnet 4.5 is now leading in agentic tool use... a 20 percent jump, which is really exciting

Claude code with Sonnet 4.5 finished the entire Stripe implementation in 15 minutes

Claude Sonnet 4.5 was faster by a lot and better by a decent amount

The thinking toggle allows you to throw more compute at specific blocks of code

It retains context perfectly even when you're 150,000 tokens deep into a massive project

It is the best performing model ever when it controls your computer

Drop in error rates for coding from 9 percent to basically zero

Claude imagine might be the coolest feature... a real time app building experience

The MCP integration allows it to search tools without eating up your prompt context

Vision latency is significantly reduced when analyzing complex UI layouts

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Claude 4.5 Sonnet and achieve better results.

Enable MCP Tool Search

Use Model Context Protocol Tool Search to reduce context usage by 85 percent and leave room for active files.

Leverage Agentic Checkpoints

Use the /checkpoint command in terminal interfaces to save progress before major refactors for instant rollback.

Context Budgeting

Clear the history between unrelated tasks to prevent context rot and maintain high logic accuracy.

System Prompt Hierarchy

Define the model persona and strict output constraints in a dedicated configuration file for cross-agent consistency.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

openai

GPT-5.3 Codex

OpenAI

GPT-5.3 Codex is OpenAI's 2026 frontier coding agent, featuring a 400K context window, 77.3% Terminal-Bench score, and superior logic for complex software...

400K context
$1.75/$14.00/1M
openai

GPT-5.4

OpenAI

GPT-5.4 is OpenAI's frontier model featuring a 1.05M context window and Extreme Reasoning. It excels at autonomous UI interaction and long-form data analysis.

1M context
$2.50/$15.00/1M
moonshot

Kimi K2 Thinking

Moonshot

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

256K context
$0.60/$2.50/1M
openai

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context
$1.75/$14.00/1M
anthropic

Claude 3.7 Sonnet

Anthropic

Claude 3.7 Sonnet is Anthropic's first hybrid reasoning model, delivering state-of-the-art coding capabilities, a 200k context window, and visible thinking.

200K context
$3.00/$15.00/1M
zhipu

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context
$1.00/$3.20/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M
anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M

Frequently Asked Questions

Find answers to common questions about Claude 4.5 Sonnet