openai

GPT-5.5

GPT-5.5 is OpenAI's flagship frontier model with a 1M context window and five reasoning effort levels, optimized for autonomous agentic workflows and coding.

Agentic AIOpenAIGPT-5Autonomous CodingFrontier Models
openai logoopenaiGPT-5April 23, 2026
Context
1.0Mtokens
Max Output
128Ktokens
Input Price
$5.00/ 1M
Output Price
$30.00/ 1M
Modality:TextImage
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
93.6%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). GPT-5.5 scored 93.6% on this benchmark.
HLE
52.2%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. GPT-5.5 scored 52.2% on this benchmark.
MMLU
92.5%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. GPT-5.5 scored 92.5% on this benchmark.
MMLU Pro
88.1%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. GPT-5.5 scored 88.1% on this benchmark.
SimpleQA
57%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. GPT-5.5 scored 57% on this benchmark.
IFEval
92.1%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. GPT-5.5 scored 92.1% on this benchmark.
AIME 2025
100%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. GPT-5.5 scored 100% on this benchmark.
MATH
98%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. GPT-5.5 scored 98% on this benchmark.
GSM8k
98.5%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. GPT-5.5 scored 98.5% on this benchmark.
MGSM
96.4%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. GPT-5.5 scored 96.4% on this benchmark.
MathVista
76%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. GPT-5.5 scored 76% on this benchmark.
SWE-Bench
58.6%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. GPT-5.5 scored 58.6% on this benchmark.
HumanEval
94.2%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. GPT-5.5 scored 94.2% on this benchmark.
LiveCodeBench
78%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. GPT-5.5 scored 78% on this benchmark.
MMMU
88.3%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. GPT-5.5 scored 88.3% on this benchmark.
MMMU Pro
62%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. GPT-5.5 scored 62% on this benchmark.
ChartQA
94%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. GPT-5.5 scored 94% on this benchmark.
DocVQA
95%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. GPT-5.5 scored 95% on this benchmark.
Terminal-Bench
82.7%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. GPT-5.5 scored 82.7% on this benchmark.
ARC-AGI
85%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. GPT-5.5 scored 85% on this benchmark.

About GPT-5.5

Learn about GPT-5.5's capabilities, features, and how it can help you achieve better results.

Transition to Agentic Intelligence

GPT-5.5 represents the transition from large language models to large agentic models. It is designed to function as an autonomous teammate rather than a simple chatbot, capable of planning, executing, and self-verifying complex workflows across digital environments. The model's primary innovation is the implementation of variable reasoning effort levels, which gives developers granular control over the model's thinking time and associated compute costs.

Technical Efficiency and Vision

Technically, GPT-5.5 maintains the 1-million-token context window of the GPT-5 family but introduces a 40% gain in token efficiency. This means that while per-token pricing has doubled relative to the 5.4 series, the effective cost for complex tasks is only 20% higher. The model's vision capabilities have also been significantly upgraded, now reaching near-human performance on technical diagrams and spatial reasoning tasks like ARC-AGI v2.

Optimization for Autonomy

It is particularly effective for autonomous coding, where it can manage entire repositories and verify its own bug fixes. By utilizing the new reasoning_effort parameter, users can toggle between five distinct logic depths, making it the first model to offer a sliding scale of intelligence for high-stakes problem solving.

GPT-5.5

Use Cases

Discover the different ways you can use GPT-5.5 to achieve great results.

Autonomous Software Engineering

Managing entire code repositories, fixing bugs, and deploying updates without human oversight.

Scientific Research Analysis

Processing thousands of research papers across a 1M window to synthesize novel hypotheses.

Complex Financial Modeling

Building and auditing intricate corporate finance structures with PhD-level mathematical precision.

Multi-Step Agentic Workflows

Creating and executing recursive task lists to achieve long-term digital objectives autonomously.

Technical Visual Analysis

Interpreting complex engineering blueprints and circuit diagrams for automated quality assurance.

High-Fidelity Data Compression

Converting massive datasets into token-dense summaries that preserve deep semantic nuances.

Strengths

Limitations

Elite Agentic Performance: Achieves an industry-leading score of 82.7 on Terminal-Bench 2.0 for computer use and terminal tasks.
High Hallucination Rate: Exhibits a hallucination rate of 86% on factual knowledge benchmarks despite high reasoning capabilities.
Massive Context Window: Supports a 1M token input context, enabling analysis of full code repositories and large research corpora.
Premium Pricing Strategy: At $5/$30 per 1M tokens, it is significantly more expensive than previous generations and open-source rivals.
Perfect Math Reasoning: Attained a perfect 100% score on the AIME 2025 olympiad-level mathematical reasoning benchmark.
Missing Video Input: Unlike some multimodal competitors like Gemini, GPT-5.5 lacks native native video-to-text processing capability.
Flexible Reasoning Effort: Features 5 distinct reasoning effort levels allowing developers to balance latency, cost, and intelligence.
Creative Writing Gaps: Benchmark performance in creative writing and poetic expression trails behind Anthropic's flagship models.

API Quick Start

openai/gpt-5.5

View Documentation
openai SDK
import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
  const response = await openai.chat.completions.create({
    model: "gpt-5.5",
    messages: [
      { role: "system", content: "You are an autonomous coding agent." },
      { role: "user", content: "Debug this Python repository and verify the fixes." }
    ],
    reasoning_effort: "xhigh"
  });

  console.log(response.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about GPT-5.5

The hallucination rate is wild though, 86% on facts? It's like a genius who refuses to say 'I don't know'.
@ArtificialAnlys
twitter
GPT-5.5 Pro is $180/mil output. We've officially entered the luxury era of AI.
@skeptrune
twitter
The proto-AGI era has arrived. It's no longer a chatbot; it's a teammate.
lostlifon
reddit
The reasoning ladder with 5 effort levels is the most useful feature release since function calling.
DataLearnerAI
hackernews
OpenAI cooked with this one. It's expensive, but it actually works for high-end agentic work.
David Ondrej
youtube
Across 20 benchmarks GPT-5.5 scores slightly higher than Opus 4.7 but it is also now $5/million tokens.
@rxhit05
twitter

Related Videos

Watch tutorials, reviews, and discussions about GPT-5.5

The reasoning ability on this model is just night and day compared to anything we've seen before.

It literally built a whole SaaS application in one go without me having to fix a single bug.

At $5 per million tokens, you really have to be sure you need this level of intelligence.

Comparing this to open models, there is still a significant gap in agentic autonomy.

The reasoning effort parameters are the real story here for developers.

OpenAI cooked with this one. It's expensive, but it actually works for high-end agentic work.

The visual understanding of UI layouts is perfectly accurate now.

It manages its own state across multiple steps much better than GPT-5.4.

You can basically hand it a terminal and let it work for twenty minutes.

The pricing is steep, but the time saved on debugging is worth it.

The context window being a full million tokens is a game changer for long document analysis.

If you're building autonomous agents, this is currently the only model that feels truly autonomous.

I noticed a high hallucination rate on very specific historical facts.

The efficiency gains mean you use fewer tokens for the same complex task.

It is a specialized tool for developers more than a casual chatbot.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of GPT-5.5 and achieve better results.

Use Reasoning Effort xhigh

Set the reasoning_effort parameter to 'xhigh' for logic-heavy tasks like math and architectural design.

Leverage Large Context Window

Provide complete documentation and codebase context in the initial system prompt to take full advantage of the 1M window.

Implement Self-Critique Loops

Request a recursive review where the model critiques its first output to mitigate the native hallucination rate.

Agentic Verification

Utilize the xhigh effort level for agentic tasks to ensure the model self-verifies every step before moving to the next.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

xai

Grok-3

xAI

Grok-3 is xAI's flagship reasoning model, featuring deep logic deduction, a 128k context window, and real-time integration with X for live research and coding.

1M context
$3.00/$15.00/1M
google

Gemini 3.1 Flash Live Preview

Google

Gemini 3.1 Flash Live Preview is Google's ultra-low-latency, audio-to-audio model featuring a 131K context window, high-fidelity multimodal reasoning, and...

131K context
$0.75/$4.50/1M
openai

GPT-5.2 Pro

OpenAI

GPT-5.2 Pro is OpenAI's 2025 flagship reasoning model featuring Extended Thinking for SOTA performance in mathematics, coding, and expert knowledge work.

400K context
$21.00/$168.00/1M
anthropic

Claude Opus 4.7

Anthropic

Claude Opus 4.7 is Anthropic's flagship model with a 1-million-token context, adaptive reasoning, and 3.3x vision resolution for enterprise-scale agents.

1M context
$5.00/$25.00/1M
google

Gemini 3.1 Pro

Google

Gemini 3.1 Pro is Google's elite multimodal model featuring the DeepThink reasoning engine, a 1M+ context window, and industry-leading ARC-AGI logic scores.

1M context
$2.00/$12.00/1M
google

Gemini 3 Pro

Google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

1M context
$5.00/$25.00/1M
moonshot

Kimi k2.6

Moonshot

Kimi k2.6 is Moonshot AI's 1T-parameter MoE model featuring a 256K context window, native video input, and elite performance in autonomous agentic coding.

256K context
$0.95/$4.00/1M

Frequently Asked Questions

Find answers to common questions about GPT-5.5