openai

GPT-5.2

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

openai logoopenaiGPT-5December 11, 2025
Context
400Ktokens
Max Output
4Ktokens
Input Price
$1.75/ 1M
Output Price
$14.00/ 1M
Modality:TextImageVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
85.7%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). GPT-5.2 scored 85.7% on this benchmark.
HLE
84.1%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. GPT-5.2 scored 84.1% on this benchmark.
MMLU
92.5%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. GPT-5.2 scored 92.5% on this benchmark.
MMLU Pro
75.4%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. GPT-5.2 scored 75.4% on this benchmark.
SimpleQA
58%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. GPT-5.2 scored 58% on this benchmark.
IFEval
89.4%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. GPT-5.2 scored 89.4% on this benchmark.
AIME 2025
100%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. GPT-5.2 scored 100% on this benchmark.
MATH
94.3%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. GPT-5.2 scored 94.3% on this benchmark.
GSM8k
93.6%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. GPT-5.2 scored 93.6% on this benchmark.
MGSM
92.1%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. GPT-5.2 scored 92.1% on this benchmark.
MathVista
78.4%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. GPT-5.2 scored 78.4% on this benchmark.
SWE-Bench
81.5%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. GPT-5.2 scored 81.5% on this benchmark.
HumanEval
95.2%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. GPT-5.2 scored 95.2% on this benchmark.
LiveCodeBench
76.2%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. GPT-5.2 scored 76.2% on this benchmark.
MMMU
86.5%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. GPT-5.2 scored 86.5% on this benchmark.
MMMU Pro
65%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. GPT-5.2 scored 65% on this benchmark.
ChartQA
88.5%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. GPT-5.2 scored 88.5% on this benchmark.
DocVQA
94.1%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. GPT-5.2 scored 94.1% on this benchmark.
Terminal-Bench
77.3%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. GPT-5.2 scored 77.3% on this benchmark.
ARC-AGI
90.5%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. GPT-5.2 scored 90.5% on this benchmark.

About GPT-5.2

Learn about GPT-5.2's capabilities, features, and how it can help you achieve better results.

GPT-5.2 is OpenAI’s flagship reasoning model designed for high-stakes professional knowledge work and autonomous engineering. Released on December 11, 2025, it marks a significant evolution from the GPT-4 and GPT-o1 series by integrating a dedicated Thinking mode with effort controls (Medium, High, Extra High). This allows the model to pause and verify multi-step logic before generating a response.

With a massive 400K context window and nearly 100% recall, it is engineered for senior-level code reviews, complex refactoring, and scientific research. The model architecture is built to support agentic workflows, featuring native tool-calling and multimodal vision that can process intricate technical diagrams and codebases simultaneously.

While it excels in logical precision and engineering benchmarks, hitting a 100% score on AIME 2025, it adopts a more formal, machine-like tone compared to competitors like Claude. It is currently priced at $1.75 per million input tokens and $14.00 per million output tokens, making it a cost-effective alternative for deep reasoning tasks that previously required high-compute human oversight.

GPT-5.2

Use Cases

Discover the different ways you can use GPT-5.2 to achieve great results.

Complex Engineering Refactors

Performing deep refactoring on performance-critical codebases while maintaining strict type invariants and architectural consistency.

Autonomous Terminal Tasks

Executing multi-step CLI workflows and managing complex cloud deployments through high performance on Terminal-Bench environments.

PhD-Level Knowledge Synthesis

Analyzing hundreds of technical sources and academic papers simultaneously to create comprehensive research reports on niche scientific topics.

Concurrency Bug Resolution

Identifying and fixing subtle race conditions or memory leaks that require high-level logical inference over long code segments.

Mechanical Code Processing

Handling large-scale, repetitive coding migrations across entire repositories without the laziness often observed in general-purpose LLMs.

Senior Technical Review

Acting as a virtual senior engineer to review design plans and identify edge cases in logic for production systems.

Strengths

Limitations

Superior Engineering Accuracy: Achieved a 77.3% score on Terminal-Bench 2.0, outperforming competitors in complex command-line interface tasks.
High Response Latency: The significant reasoning overhead means the model is noticeably slower than previous iterations, leading to long wait times.
Elite Mathematical Reasoning: Scored 100% on the AIME 2025 benchmark, demonstrating a capacity for competition-level math without external tools.
Artificial UX Tone: Critiqued by users for a pretentious and overly structured helpfulness that feels less natural than the Claude series.
Low Hallucination Rate: Community testing and internal benchmarks show a 30% reduction in factual fabrication compared to previous flagship generations.
Opaque Thought Process: Unlike some transparent reasoning models, GPT-5.2 often hides its internal chain-of-thought, providing only the final verified answer.
Extended Task Persistence: Capable of sustaining active autonomous work sessions for over two hours, making it ideal for large-scale development work.
Premium Reasoning Costs: The $14.00 output price can scale quickly during long reasoning tasks where high volumes of thinking tokens are charged.

API Quick Start

openai/gpt-5.2

View Documentation
openai SDK
import OpenAI from 'openai';

const openai = new OpenAI();

async function solveCodeProblem() {
  const response = await openai.chat.completions.create({
    model: 'gpt-5.2',
    messages: [{ role: 'user', content: 'Debug this race condition in my Rust service.' }],
    reasoning_effort: 'high',
    temperature: 0,
  });
  console.log(response.choices[0].message.content);
}

solveCodeProblem();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about GPT-5.2

GPT 5.2 in Codex is a very huge improvement, it's more willing to handle those mechanical tasks that would normally make models lazy.
ArchMeta1868
reddit
The increased deliberation and time spent fact-checking its output is to be commended... the reliability is much improved.
Thomas Randall
techopedia
The model powering deep research showcased a human-like approach by effectively seeking out specialized information when necessary.
OpenAI Official
twitter
OpenAI's focus on structured 'user care' feels like a corporate mask for a cold core compared to the natural discussions in Claude.
Anonymous Developer
hackernews
Finally a model that doesn't get lazy halfway through a 500-line refactor.
CodeWizard
reddit
The reasoning effort parameter is the real MVP for complex logic problems.
AIBuilder
twitter

Related Videos

Watch tutorials, reviews, and discussions about GPT-5.2

This is actually insane. Look at this one shot.

The design I'm not super impressed with with GPT 5.2... it did much worse than Gemini 3.

The context recall is nearly perfect across the whole 400k range.

It feels much more like a reasoning engine than a chatbot.

The latency is the only real dealbreaker for some real-time apps.

GPT 5.2 can now create fully formatted spreadsheets and slide decks directly inside chat GPT.

It's like the model finally grew up and started taking its job seriously.

Use the high reasoning setting only for logic-heavy tasks.

The hallucinations are down significantly compared to the 4o series.

Agentic workflows are finally viable without constant babysitting.

GPT 5.2 is actually 40% more expensive than 5.1, but it's still significantly cheaper than Opus.

GPT 5.2 took 11 minutes and 20 seconds [to build the app]. So double the amount of time [compared to Opus].

The output quality is much higher when you allow the thinking mode to run.

It handled the multi-file refactor without losing the type definitions.

If you need raw speed, this isn't the model for you.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of GPT-5.2 and achieve better results.

Leverage Thinking Effort

Use the reasoning_effort parameter (medium, high, xhigh) to match the model's deliberation time to the complexity of the task.

Enable Codex for Persistence

When working on large repos, use the dedicated Codex environment to maintain active processing sessions for up to 150 minutes.

Spoon-feed Context

Provide rich background documentation in system prompts as the model performs best when interviewed about the context it needs.

Iterate on Requirements

Explicitly instruct the model to perform verification checks against the current codebase to ensure requirements are validated.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

zhipu

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context
$1.00/$3.20/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M
moonshot

Kimi K2 Thinking

Moonshot

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

256K context
$0.60/$2.50/1M
anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
openai

GPT-5.4

OpenAI

GPT-5.4 is OpenAI's frontier model featuring a 1.05M context window and Extreme Reasoning. It excels at autonomous UI interaction and long-form data analysis.

1M context
$2.50/$15.00/1M
xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

256K context
$0.60/$3.00/1M
openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M

Frequently Asked Questions

Find answers to common questions about GPT-5.2