anthropic

Claude Opus 4.5

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

anthropic logoanthropicClaudeNovember 24, 2025
Context
200Ktokens
Max Output
64Ktokens
Input Price
$5.00/ 1M
Output Price
$25.00/ 1M
Modality:TextImage
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
87%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Claude Opus 4.5 scored 87% on this benchmark.
MMLU
90.8%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Claude Opus 4.5 scored 90.8% on this benchmark.
MMLU Pro
80%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Claude Opus 4.5 scored 80% on this benchmark.
IFEval
90%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Claude Opus 4.5 scored 90% on this benchmark.
AIME 2025
37%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Claude Opus 4.5 scored 37% on this benchmark.
MATH
85%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Claude Opus 4.5 scored 85% on this benchmark.
GSM8k
95%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Claude Opus 4.5 scored 95% on this benchmark.
MGSM
92%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Claude Opus 4.5 scored 92% on this benchmark.
MathVista
72%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Claude Opus 4.5 scored 72% on this benchmark.
SWE-Bench
80.9%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Claude Opus 4.5 scored 80.9% on this benchmark.
HumanEval
90%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Claude Opus 4.5 scored 90% on this benchmark.
LiveCodeBench
75%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Claude Opus 4.5 scored 75% on this benchmark.
MMMU
80.7%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Claude Opus 4.5 scored 80.7% on this benchmark.
MMMU Pro
60%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Claude Opus 4.5 scored 60% on this benchmark.
ChartQA
90%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Claude Opus 4.5 scored 90% on this benchmark.
DocVQA
94%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Claude Opus 4.5 scored 94% on this benchmark.
Terminal-Bench
59.3%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Claude Opus 4.5 scored 59.3% on this benchmark.
ARC-AGI
37.6%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Claude Opus 4.5 scored 37.6% on this benchmark.

Try Claude Opus 4.5 Free

Chat with Claude Opus 4.5 for free. Test its capabilities, ask questions, and explore what this AI model can do.

Prompt
Response
anthropic/claude-4.5-opus

Your AI response will appear here

About Claude Opus 4.5

Learn about Claude Opus 4.5's capabilities, features, and how it can help you achieve better results.

The Pinnacle of Autonomous Agency

Claude Opus 4.5 represents Anthropic's most significant leap in frontier intelligence, specifically engineered for the most complex tasks in software engineering and autonomous operation. Released in late 2025, it shattered records on the SWE-bench Verified benchmark with a score of 80.9%, making it the first model to effectively automate large-scale debugging and system refactoring with minimal human intervention.

Intelligence with a Soul

Beyond its technical prowess, Opus 4.5 introduces a refined persona guided by Anthropic’s "soul document," emphasizing diplomatic honesty and nuanced helpfulness. This makes the model uniquely capable of understanding writerly taste and human-centric design. It is optimized for agentic workflows, featuring a 200,000-token context window and a specialized "effort parameter" that allows developers to scale reasoning depth against computational costs.

Multimodal Excellence

As a multimodal powerhouse, Opus 4.5 excels at vision-based tasks, from parsing dense architectural diagrams to extracting data from complex document layouts. Its ability to navigate a terminal-native environment via Claude Code allows it to perform system-wide audits and security patching, positioning it as a persistent, highly-capable partner for professional engineering teams.

Claude Opus 4.5

Use Cases for Claude Opus 4.5

Discover the different ways you can use Claude Opus 4.5 to achieve great results.

Autonomous Engineering

Automates the entire lifecycle of GitHub issues including reproduction, debugging, and testing.

System Administration

Conducts autonomous server audits and security patching through direct terminal interaction.

Architectural Refactoring

Ingests massive repositories to suggest and implement system-wide security hardening.

Complex Document Synthesis

Transforms hundreds of multi-page PDFs into structured financial models or data visualizations.

Creative Game Development

Generates functional 3D environments with working physics from single, complex prompts.

Persistent Research Assistant

Cross-references massive datasets to find non-obvious contradictions in legal or technical files.

Strengths

Limitations

Record-Breaking Coding: Achieves 80.9% on SWE-bench Verified, automating complex software engineering assignments.
Premium Pricing Model: At $5/$25 per 1M tokens, it is significantly more expensive than mid-tier models.
Superior Token Efficiency: Reaches frontier intelligence while using up to 76% fewer tokens than Sonnet for similar logic.
Math Benchmark Gap: Trails specialized reasoning models in competition-level mathematics like the AIME test.
Massive 200K Context: Handles massive document sets and repositories with high-fidelity retrieval accuracy.
No Native Audio/Video: Currently lacks the ability to directly process audio or video streams without preprocessing.
Autonomous Agent Logic: Optimized for long-running autonomous sessions through terminal-native tools and stop hooks.
High Execution Latency: Deep reasoning tasks can take substantial time, sometimes requiring hours-long sessions for agents.

API Quick Start

anthropic/claude-4.5-opus

View Documentation
anthropic SDK
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: process.env['ANTHROPIC_API_KEY'],
});

async function main() {
  const message = await client.messages.create({
    max_tokens: 4096,
    messages: [{ role: 'user', content: 'Perform a full system audit of this code for security flaws.' }],
    model: 'claude-4.5-opus-20251124',
  });
  console.log(message.content[0].text);
}

main();

Install the SDK and start making API calls in minutes.

What People Are Saying About Claude Opus 4.5

See what the community thinks about Claude Opus 4.5

"Every single line of my production code was generated by Claude Code running on Opus 4.5"
Boris Cherny
x
"Opus 4.5 is where you need to think about writerly taste and how it sounds like a human"
Nate B Jones
youtube
"Intelligence is finally getting cheaper; this model is 3x cheaper than the previous Opus"
BuildwithVignesh
reddit
"Claude Opus 4.5 broke a benchmark by being too clever and exploiting a loophole"
MetaKnowing
reddit
"The reasoning depth and coding capability are on another level compared to anything else"
Santosh Gupta
x
"The agentic capabilities on the terminal via Claude Code make it a standout for devops"
hn_user_alpha
hackernews

Videos About Claude Opus 4.5

Watch tutorials, reviews, and discussions about Claude Opus 4.5

The price is now three times cheaper... $5 for a million input tokens.

This is the best result I've ever gotten back from a model on this single prompt Minecraft test.

Opus 4.5 scored higher than any human candidate has ever scored on this take-home exam.

The reasoning here isn't just following instructions; it's understanding intent.

If you are doing complex architectural work, this is the only model that handles it reliably.

80.9% on SWE-bench verified... and uses 50% fewer tokens than Sonnet.

Opus 4.5 is aimed squarely at professional software engineering, not hobbyist coding.

Beyond SWEBench, it posts a 15% gain over Sonnet on Terminal Bench.

The model is capable of long-running autonomous sessions that can last for hours.

Vision performance is noticeably more detailed when parsing dense technical diagrams.

Think of Claude Opus 4.5 as a persuasion layer and an absolute agentic monster.

A lot of engineers end up preferring Opus 4.5 because of the ergonomics and the harness.

The model is aware of its soul spec in an out-of-context manner.

Opus 4.5 exhibits a level of writerly taste that GPT-5.2 simply misses.

It uses a dynamic effort parameter to scale its intelligence based on the task.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows
Watch demo video

Pro Tips

Expert tips to help you get the most out of this model and achieve better results.

Use High Effort for Logic

Set the 'effort' parameter to 'high' for complex architectural tasks to ensure maximum reasoning depth.

Deploy Stop Hooks

Utilize specialized stop hooks in agentic workflows to allow the model to run and self-correct over several hours.

Leverage Claude Code

Pair the model with the Claude Code CLI tool to unlock its full potential for terminal-native system tasks.

Optimize Token Usage

Use Opus 4.5 for high-logic tasks only, as it produces equivalent quality to Sonnet with up to 76% fewer tokens.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

xai

Grok-4

xai

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
openai

GPT-5.1

openai

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
zhipu

GLM-4.7

zhipu

GLM-4.7 by Zhipu AI is a flagship 358B MoE model featuring a 200K context window, elite 73.8% SWE-bench performance, and native Deep Thinking for agentic...

200K context
$0.60/$2.20/1M
anthropic

Claude 3.7 Sonnet

anthropic

Claude 3.7 Sonnet is Anthropic's first hybrid reasoning model, delivering state-of-the-art coding capabilities, a 200k context window, and visible thinking.

200K context
$3.00/$15.00/1M
xai

Grok-3

xai

Grok-3 is xAI's flagship reasoning model, featuring deep logic deduction, a 128k context window, and real-time integration with X for live research and coding.

128K context
$3.00/$15.00/1M
google

Gemini 3 Flash

google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
anthropic

Claude Sonnet 4.5

anthropic

Anthropic's Claude Sonnet 4.5 delivers world-leading coding (77.2% SWE-bench) and a 200K context window, optimized for the next generation of autonomous agents.

200K context
$3.00/$15.00/1M
google

Gemini 3 Pro

google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M

Frequently Asked Questions

Find answers to common questions about this model