anthropic

Claude 3.7 Sonnet

Claude 3.7 Sonnet is Anthropic's first hybrid reasoning model, delivering state-of-the-art coding capabilities, a 200k context window, and visible thinking.

anthropic logoanthropicClaude 3February 24, 2025
Context
200Ktokens
Max Output
128Ktokens
Input Price
$3.00/ 1M
Output Price
$15.00/ 1M
Modality:TextImage
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
84.8%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Claude 3.7 Sonnet scored 84.8% on this benchmark.
HLE
34%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Claude 3.7 Sonnet scored 34% on this benchmark.
MMLU
89%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Claude 3.7 Sonnet scored 89% on this benchmark.
MMLU Pro
74%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Claude 3.7 Sonnet scored 74% on this benchmark.
SimpleQA
42%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Claude 3.7 Sonnet scored 42% on this benchmark.
IFEval
93.2%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Claude 3.7 Sonnet scored 93.2% on this benchmark.
AIME 2025
54.8%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Claude 3.7 Sonnet scored 54.8% on this benchmark.
MATH
96.2%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Claude 3.7 Sonnet scored 96.2% on this benchmark.
GSM8k
97%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Claude 3.7 Sonnet scored 97% on this benchmark.
MGSM
92%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Claude 3.7 Sonnet scored 92% on this benchmark.
MathVista
70%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Claude 3.7 Sonnet scored 70% on this benchmark.
SWE-Bench
70.3%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Claude 3.7 Sonnet scored 70.3% on this benchmark.
HumanEval
94%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Claude 3.7 Sonnet scored 94% on this benchmark.
LiveCodeBench
65%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Claude 3.7 Sonnet scored 65% on this benchmark.
MMMU
75%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Claude 3.7 Sonnet scored 75% on this benchmark.
MMMU Pro
55%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Claude 3.7 Sonnet scored 55% on this benchmark.
ChartQA
89%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Claude 3.7 Sonnet scored 89% on this benchmark.
DocVQA
94%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Claude 3.7 Sonnet scored 94% on this benchmark.
Terminal-Bench
35.2%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Claude 3.7 Sonnet scored 35.2% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Claude 3.7 Sonnet scored 12% on this benchmark.

Try Claude 3.7 Sonnet Free

Chat with Claude 3.7 Sonnet for free. Test its capabilities, ask questions, and explore what this AI model can do.

Prompt
Response
anthropic/claude-3-7-sonnet-20250219

Your AI response will appear here

About Claude 3.7 Sonnet

Learn about Claude 3.7 Sonnet's capabilities, features, and how it can help you achieve better results.

Hybrid Reasoning and Transparency

Claude 3.7 Sonnet represents a landmark shift in LLM architecture as Anthropic's first 'hybrid reasoning' model. It uniquely allows users to toggle between standard, low-latency responses and an 'extended thinking' mode that displays its internal chain-of-thought. This transparency provides users with a clear window into the model's logic, making it particularly effective for complex troubleshooting and high-stakes reasoning tasks.

Software Engineering Mastery

Designed with a strong focus on software engineering and production-ready outputs, the model has set new industry standards on benchmarks like SWE-Bench Verified. It excels in 'vibe coding,' where developers describe high-level intent and the model handles implementation across multiple files. It handles complex refactors and architectural decisions with precision that surpasses previous frontier models.

Massive Context and Agentic Tools

With a massive 200,000-token context window and an agentic toolset called Claude Code, it transforms from a simple chatbot into a collaborative technical partner. It is capable of managing entire project lifecycles, from initial documentation review to automated git workflows and test execution, ensuring that development remains fast and bug-free.

Claude 3.7 Sonnet

Use Cases for Claude 3.7 Sonnet

Discover the different ways you can use Claude 3.7 Sonnet to achieve great results.

Vibe Coding

Building functional software from scratch by describing intent in natural language.

Advanced Debugging

Utilizing extended thinking to analyze complex logs and provide precise, one-shot fixes.

Large Context Analysis

Reviewing and refactoring entire codebases or lengthy technical documentation in a single prompt.

Agentic Development

Powering terminal-based tools like Claude Code to automate git workflows and test execution.

Frontend UI Generation

Creating elegant, maintainable React and Svelte components with built-in design sensibility.

Factual Research

Analyzing massive PDF documents and datasets with high accuracy and low hallucination rates.

Strengths

Limitations

Industry-Leading Coding: Achieved a state-of-the-art 70.3% on SWE-bench Verified, solving real GitHub issues with unprecedented accuracy.
Response Latency: Enabling 'extended thinking' mode significantly increases the time to first token compared to standard model responses.
Visible Reasoning: The first model to offer visible, user-controllable 'extended thinking' for complex, high-stakes problem-solving.
Premium Pricing: Output costs of $15 per 1M tokens remain considerably higher than most 'mini' or open-weights alternatives.
Agentic Integration: Specifically optimized for tool-use and CLI interaction via the Claude Code agent framework for end-to-end task automation.
No Native Audio/Video: Unlike GPT-4o or Gemini 2.0, it lacks native audio and video input processing capabilities.
Superior Design Taste: Consistently generates more elegant, accessible, and maintainable UI code compared to other frontier models.
Computational Cost: Deep reasoning sessions can rapidly consume token budgets and context limits during large-scale codebase refactors.

API Quick Start

anthropic/claude-3-7-sonnet-20250219

View Documentation
anthropic SDK
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const msg = await anthropic.messages.create({
  model: "claude-3-7-sonnet-20250219",
  max_tokens: 1024,
  thinking: { type: "enabled", budget_tokens: 1024 },
  messages: [{ role: "user", content: "Write a high-performance Rust function for matrix multiplication." }],
});

console.log(msg.content);

Install the SDK and start making API calls in minutes.

What People Are Saying About Claude 3.7 Sonnet

See what the community thinks about Claude 3.7 Sonnet

"Claude 3.7 Sonnet is the best coding AI model in the world; it blew my mind on challenging tasks."
rawcell4772
reddit
"With a single prompt, it nailed everything perfectly on a complex TypeScript project."
rawcell4772
reddit
"Claude Code with Sonnet 3.7 is much better than Cline and currently the best tool."
peterkrueck
reddit
"The leap in quality with top-tier models like 3.7 has been transformative for my outlook."
lurking_horrors
reddit
"Claude 3.7 is straight gas hits different... highkey goated on God no cap"
Fireship
youtube
"Claude 3.7's reasoning mode is a complete paradigm shift for debugging logic."
DevLead99
x

Videos About Claude 3.7 Sonnet

Watch tutorials, reviews, and discussions about Claude 3.7 Sonnet

The new 3.7 model absolutely crushed all the other models... now capable of solving 70.3% of GitHub issues

Using a strongly typed language along with TDD are ways for the AI to validate that the code it writes is actually valid

The model is incredibly smart at following instructions

The performance on SWE-bench is actually insane

Visible reasoning is a game changer for transparency

Claude 3.7 Sonnet... it's probably the best LLM for code generation

If you use the API, you can output 128,000 tokens in one shot

The 128k output limit is a massive upgrade

Its design taste for frontend components is unmatched

Tool use and agentic capabilities are core to this model

Reasoning should be an integrated capability of Frontier models rather than a separate model entirely

Claude 3.7 manages to surpass those models [DeepSeek, o3] by a pretty significant amount

The latency is slightly higher in reasoning mode

It beats DeepSeek R1 on several instruction following tasks

Anthropic really focused on production-ready outputs

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows
Watch demo video

Pro Tips

Expert tips to help you get the most out of this model and achieve better results.

Thinking Budget

Use the 'extended thinking' mode specifically for complex logic or architecture planning to get higher quality results.

Context Control

Regularly use /clear or restart chats to save on context costs and prevent the model from getting sluggish.

Verification

Ask Claude to write and run tests for its own code using the Claude Code tool to ensure production stability.

Markdown Specs

Provide feature requirements in structured Markdown files for better instruction following during large projects.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.