anthropic

Claude Sonnet 4.5

Anthropic's Claude Sonnet 4.5 delivers world-leading coding (77.2% SWE-bench) and a 200K context window, optimized for the next generation of autonomous agents.

anthropic logoanthropicClaude 4September 29, 2025
Context
200Ktokens
Max Output
64Ktokens
Input Price
$3.00/ 1M
Output Price
$15.00/ 1M
Modality:TextImageAudioVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
83%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Claude Sonnet 4.5 scored 83% on this benchmark.
HLE
34%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Claude Sonnet 4.5 scored 34% on this benchmark.
MMLU
89%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Claude Sonnet 4.5 scored 89% on this benchmark.
MMLU Pro
78%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Claude Sonnet 4.5 scored 78% on this benchmark.
SimpleQA
52%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Claude Sonnet 4.5 scored 52% on this benchmark.
IFEval
88%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Claude Sonnet 4.5 scored 88% on this benchmark.
AIME 2025
87%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Claude Sonnet 4.5 scored 87% on this benchmark.
MATH
87%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Claude Sonnet 4.5 scored 87% on this benchmark.
GSM8k
98%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Claude Sonnet 4.5 scored 98% on this benchmark.
MGSM
92%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Claude Sonnet 4.5 scored 92% on this benchmark.
MathVista
72%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Claude Sonnet 4.5 scored 72% on this benchmark.
SWE-Bench
77%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Claude Sonnet 4.5 scored 77% on this benchmark.
HumanEval
94%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Claude Sonnet 4.5 scored 94% on this benchmark.
LiveCodeBench
68%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Claude Sonnet 4.5 scored 68% on this benchmark.
MMMU
78%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Claude Sonnet 4.5 scored 78% on this benchmark.
MMMU Pro
55%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Claude Sonnet 4.5 scored 55% on this benchmark.
ChartQA
89%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Claude Sonnet 4.5 scored 89% on this benchmark.
DocVQA
92%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Claude Sonnet 4.5 scored 92% on this benchmark.
Terminal-Bench
50%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Claude Sonnet 4.5 scored 50% on this benchmark.
ARC-AGI
14%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Claude Sonnet 4.5 scored 14% on this benchmark.
Prompt
Response
GPT-5 Mini

Your AI response will appear here

About Claude Sonnet 4.5

Learn about Claude Sonnet 4.5's capabilities, features, and how it can help you achieve better results.

**The Frontier of Agentic Intelligence**

Claude Sonnet 4.5 represents Anthropic’s most significant leap in "frontier intelligence," specifically optimized for the era of autonomous AI agents. Released in late 2025, it is the industry's first true "hybrid reasoning" model, allowing developers to toggle between high-speed execution for routine tasks and extended thinking for complex logical challenges. It notably eclipsed previous benchmarks in computer use and tool orchestration, making it the preferred engine for terminal-based agents and multi-file software engineering.

**Precision and Reduced Hallucinations**

The model is built on an architecture that prioritizes "measure twice, cut once" logic, significantly reducing the sycophancy and hallucinations seen in the 3.5 series. With a massive 64,000-token output limit and a 200,000-token input window, it can ingest entire repositories while generating full-length application files in a single pass. It also introduces native "checkpoints" for agentic workflows, allowing agents to roll back and correct their own mistakes autonomously.

**Multimodal and Reasoning Prowess**

Beyond coding, Sonnet 4.5 dominates in multimodal document analysis and complex financial modeling. Its internal logic is trained to prioritize architectural context, enabling it to map out large-scale codebases better than any predecessor. Whether processing handwritten notes or implementing a full Stripe integration, Sonnet 4.5 maintains a high level of factual accuracy and instruction following.

Claude Sonnet 4.5

Use Cases for Claude Sonnet 4.5

Discover the different ways you can use Claude Sonnet 4.5 to achieve great results.

Autonomous Software Engineering

Use Claude Sonnet 4.5 to navigate complex codebases, implement features across multiple files, and run tests independently.

Computer-Use Agents

Deploy the model to control desktops and web browsers for data extraction, legacy system navigation, or repetitive administrative tasks.

Enterprise Agentic Search

Orchestrate multi-step search queries and synthesize disparate information from internal documentation and the live web.

Complex Financial Modeling

Leverage its 87% AIME score to perform deep logical deductions on financial reports and market data.

Technical Content Refinement

Convert high-level requirements into professional PRDs, technical specifications, and copy-paste-ready codebases.

Multimodal Document Analysis

Process thousands of pages of charts, handwritten notes, and technical diagrams with state-of-the-art vision capabilities.

Strengths

Limitations

Agentic Coding Power: Currently the world record holder on SWE-bench Verified with a 77.2% success rate on real GitHub issues.
Usage Caps: Professional users often report hitting weekly usage limits quickly on the $20/month Pro plan.
Incredible Speed: Operates at 40-60 tokens per second, making it significantly faster than previous frontier models for interactive use.
Search Latency: Agentic web browsing (BrowseComp) remains a weak spot compared to specialized search models.
Hybrid Reasoning Flexibility: The first model to effectively balance "fast chat" mode with "extended thinking" for complex logical chains.
Niche Knowledge Gaps: Struggles with highly specialized visual tasks, such as identifying specific skateboarding tricks (29% accuracy on SkateBench).
Massive Output Window: A 64K output token limit allows for the generation of entire multi-file features in a single API call.
Agentic Costs: Running the model autonomously in terminal mode can consume $50-$100 in tokens for a single complex app-building session.

API Quick Start

anthropic/claude-sonnet-4.5

View Documentation
anthropic SDK
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const msg = await anthropic.messages.create({
  model: "claude-sonnet-4.5-20250929",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Implement a rate limiter in Node.js" }],
});

console.log(msg.content[0].text);

Install the SDK and start making API calls in minutes.

What People Are Saying About Claude Sonnet 4.5

See what the community thinks about Claude Sonnet 4.5

"Claude Sonnet 4.5 is the new king of AI coding... it is looking really, really good"
James Montemagno
youtube
"Sonnet 4.5 is doing a really good job... it was faster by a lot and better by a decent amount"
Cole Medin
youtube
"I'm blown away by Sonnet 4.5... this one is designing some absolutely stunning pages"
Savage Reviews
youtube
"The terminal-based agent is a 'developer living in your terminal'... it can read codebases and run tests autonomously"
DevUser_99
reddit
"Pricing remains the same as 3.5, but the 'Checkpoints' feature makes it worth 10x more for professional workflows"
AgentArchitect
x
"At 77.2% on SWE-bench, this is the first model that actually feels like a Senior Engineer"
HackerNewsReader
hackernews

Videos About Claude Sonnet 4.5

Watch tutorials, reviews, and discussions about Claude Sonnet 4.5

Anthropic claims this is the 'best code model in the world' with substantial gains in reasoning, math, and computer use.

While GPT-5 might be better for high-level planning, Claude 4.5 Sonnet is currently the 'nicest' model to use for implementation.

The speed is just incredible, making interactive coding feel much more fluid.

It handles multi-file edits with a level of precision we haven't seen before.

The reduction in hallucinations makes it a reliable partner for production code.

Claude Sonnet 4.5 was faster by a lot and better by a decent amount than GPT-5 Codex.

It did the entire Stripe implementation in 15 minutes... more than two times faster than Opus 4.1.

The ability to follow complex tool-calling instructions is its secret sauce.

I'm seeing fewer 'sycophancy' issues where the model just agrees with my bad ideas.

This is the first model I'd actually trust to run a terminal-based agent unsupervised.

This is one of the best landing pages, if not THE best landing page, I've ever seen created from a prompt.

It's an absolute beast... it is designing some absolutely stunning pages with really, really nice code.

The vision capabilities for interpreting UI design are significantly upgraded.

It feels like it understands the aesthetic requirements, not just the technical ones.

Sonnet 4.5 is officially the new benchmark for creative front-end engineering.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows
Watch demo video

Pro Tips

Expert tips to help you get the most out of this model and achieve better results.

Leverage CLAUDE.md

Use a CLAUDE.md file in your repository root to give the model short summaries and pointers; this reduces token waste by 30%.

Hybrid Reasoning Toggle

Use the 'thinking' parameter in your API calls only for logic-heavy tasks to save on latency and costs during routine operations.

The .claude/context Folder

Create a .claude/context.md file to store architectural decisions; the model is specifically trained to prioritize this path for codebase mapping.

Prompt Caching

Enable prompt caching for static documentation or large codebases to save up to 90% on input costs for repeated queries.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Frequently Asked Questions

Find answers to common questions about this model