openai

GPT-5.4

GPT-5.4 is OpenAI's frontier model featuring a 1.05M context window and Extreme Reasoning. It excels at autonomous UI interaction and long-form data analysis.

OpenAIGPT-51M ContextReasoningMultimodal
openai logoopenaiGPT-5March 5, 2026
Context
1.1Mtokens
Max Output
128Ktokens
Input Price
$2.50/ 1M
Output Price
$15.00/ 1M
Modality:TextImageAudio
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
84%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). GPT-5.4 scored 84% on this benchmark.
HLE
36.6%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. GPT-5.4 scored 36.6% on this benchmark.
MMLU
91.2%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. GPT-5.4 scored 91.2% on this benchmark.
MMLU Pro
83.1%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. GPT-5.4 scored 83.1% on this benchmark.
SimpleQA
58%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. GPT-5.4 scored 58% on this benchmark.
IFEval
95.2%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. GPT-5.4 scored 95.2% on this benchmark.
AIME 2025
88%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. GPT-5.4 scored 88% on this benchmark.
MATH
92.4%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. GPT-5.4 scored 92.4% on this benchmark.
GSM8k
98.6%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. GPT-5.4 scored 98.6% on this benchmark.
MGSM
98%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. GPT-5.4 scored 98% on this benchmark.
MathVista
78.4%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. GPT-5.4 scored 78.4% on this benchmark.
SWE-Bench
57.7%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. GPT-5.4 scored 57.7% on this benchmark.
HumanEval
94.5%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. GPT-5.4 scored 94.5% on this benchmark.
LiveCodeBench
78.3%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. GPT-5.4 scored 78.3% on this benchmark.
MMMU
84%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. GPT-5.4 scored 84% on this benchmark.
MMMU Pro
65.2%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. GPT-5.4 scored 65.2% on this benchmark.
ChartQA
92.5%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. GPT-5.4 scored 92.5% on this benchmark.
DocVQA
95%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. GPT-5.4 scored 95% on this benchmark.
Terminal-Bench
60%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. GPT-5.4 scored 60% on this benchmark.
ARC-AGI
14.2%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. GPT-5.4 scored 14.2% on this benchmark.

About GPT-5.4

Learn about GPT-5.4's capabilities, features, and how it can help you achieve better results.

The Frontier of Long-Context Reasoning

GPT-5.4 represents the high-performance evolution of the GPT-5 series. It features an industry-leading 1.05-million-token context window. This model handles expansive datasets, such as massive code repositories or multi-year historical logs, without losing reasoning fidelity. The interactive Mid-Response Steering allows users to monitor and adjust the model thinking plan in real-time. This ensures the output aligns with complex, multi-step intents.

Unified Intelligence and Autonomous Action

Technically, GPT-5.4 unifies the world-class coding strengths of previous Codex branches with the creative nuances of the standard GPT-5 series. It features a specialized Thinking mode with adjustable effort levels. These include Standard, Extended, and Heavy modes. It utilizes reinforced chain-of-thought processing to solve PhD-level science and logic problems. Beyond text, GPT-5.4 introduces native computer use capabilities. It achieves a 75% score on OSWorld-Verified tasks by interpreting visual screenshots and executing coordinate-based clicks.

Efficiency and Reliability

OpenAI reports a 33% decrease in claim-level errors compared to predecessors. This makes GPT-5.4 a primary choice for autonomous agents and high-stakes decision support. It is engineered for token and energy efficiency. This allows for cheaper long-context processing than previous iterations. Whether managing an entire enterprise codebase or acting as an autonomous scheduling agent, GPT-5.4 sets a new standard for reliability and agentic performance.

GPT-5.4

Use Cases

Discover the different ways you can use GPT-5.4 to achieve great results.

Large-Scale Code Refactoring

Systematically rewriting legacy codebases exceeding 300,000 lines with strict adherence to architectural standards.

Autonomous Financial Modeling

Building complex three-statement models where the AI reconciles income statements, balance sheets, and cash flows.

Interactive System Design

Developing 3D simulations or physics-based games by steering the model logic path during the generation process.

Agentic Computer Use

Executing multi-step desktop tasks such as bulk data entry, email management, and software testing via native UI interaction.

Long-Context Legal Analysis

Cross-referencing hundreds of legal documents to identify inconsistencies or extract specific clauses with high recall accuracy.

PhD-Level Research Support

Solving complex mathematical proofs and scientific problems using Heavy Reasoning mode for verified logical chains.

Strengths

Limitations

Massive 1.05M Context: Provides industry-leading capacity for deep analysis of enormous codebases and document sets without context decay.
Reasoning Latency: Enabling Heavy Thinking mode can result in wait times of several minutes for complex logic or large code generations.
Interactive Thinking: Unique mid-response navigation allows users to steer reasoning paths, significantly reducing wasted generations and tokens.
Rate Limiting: During the initial rollout, users may encounter aggressive message limits or temporary account bugs as capacity scales.
Native Computer Use: High-accuracy UI interaction (75% on OSWorld) enables the model to work directly within desktop and browser environments.
Non-Linear Scaling: In some creative tasks, lighter reasoning modes have been found to outperform heavy modes in aesthetic detail.
Extreme Token Efficiency: Optimized architecture delivers 2026-frontier performance with lower latency and energy consumption than previous GPT-5 versions.
Context Rot at 1M: While the window is large, retrieval accuracy drops significantly when moving from 256K to 1M tokens.

API Quick Start

openai/gpt-5.4

View Documentation
openai SDK
import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
  const completion = await openai.chat.completions.create({
    model: "gpt-5.4",
    messages: [
      { role: "user", content: "Refactor this controller for better error handling." }
    ],
    reasoning_effort: "heavy"
  });

  console.log(completion.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about GPT-5.4

GPT 5.4 in Codex is a very huge improvement... I've actually seen it work for 150 minutes at once without losing context.
ArchMeta1868
reddit
GPT 5.4's 3D design chops are unmatched. The way it handled transparency and physics in my ship simulator was spookily accurate.
AI_Creative_Daily
twitter
The mid-response course correction is incredible. I can actually see where the model is going and fix it before it wastes tokens.
dev_guru_99
reddit
It beat humans 83% of the time across 44 different jobs. Lawyer. Accountant. Financial analyst. Administrator.
Josh Kale
twitter
OpenAI finally fixed the output bottleneck. 128k output tokens is a dream for developers building full-stack applications.
TheCodeChannel
youtube
The computer use latency is still there, but the precision is high enough to handle complex SAP workflows which is wild.
enterprise_sysadmin
hackernews

Related Videos

Watch tutorials, reviews, and discussions about GPT-5.4

GPT 5.4 is here and we may actually have a new best model on the planet.

GPT 5.4 Thinking can now provide an upfront plan of its thinking... allows you to guide the model.

This interactive element solves the black box problem of reasoning models.

The speed compared to o1-preview is night and day for standard tasks.

You are seeing reasoning that actually feels consistent over long conversations.

GPT 5.4... wasn't built to chat. It was built to work.

Deferred loading... reduced total token usage by 47% with no loss in accuracy.

The computer use functionality tracks UI elements with a coordinate-based system.

I tested it with a legacy Java codebase and it actually understood the cross-file dependencies.

We are moving into a world where the AI is the operating system controller.

1 million 50,000 token context window. This is a very long context window.

Navigate it while it is thinking, which is definitely more efficient to use.

The pricing is steep but for large document sets, it's the only model that works.

Thinking mode can be adjusted based on the complexity of your prompt.

It feels more reliable on factual recall than any previous GPT version.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of GPT-5.4 and achieve better results.

Toggle Thinking Effort

Use the Standard, Extended, or Heavy parameters to balance the need for accuracy against generation speed and cost.

Review the Thinking Plan

Monitor the upfront plan provided by the model and use Mid-Response Steering to correct it if the logic deviates.

Leverage Deferred Tool Loading

For agentic workflows, use the deferred loading registry to reduce upfront token costs by up to 47%.

Use Completeness Contracts

Explicitly define what finished means in your prompt to make the model more persistent during long-running tasks.

Max Resolution Vision

Upload high-fidelity images up to 10.24M pixels for precise visual inspections of UI elements or technical diagrams.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

moonshot

Kimi K2 Thinking

Moonshot

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

256K context
$0.60/$2.50/1M
openai

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context
$1.75/$14.00/1M
zhipu

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context
$1.00/$3.20/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M
openai

GPT-5.3 Codex

OpenAI

GPT-5.3 Codex is OpenAI's 2026 frontier coding agent, featuring a 400K context window, 77.3% Terminal-Bench score, and superior logic for complex software...

400K context
$1.75/$14.00/1M
anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

256K context
$0.60/$3.00/1M

Frequently Asked Questions

Find answers to common questions about GPT-5.4