openai

GPT-5.1

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

openai logoopenaiGPT-52025-11-13
Context
400Ktokens
Max Output
128Ktokens
Input Price
$1.25/ 1M
Output Price
$10.00/ 1M
Modality:TextImage
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
88.1%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). GPT-5.1 scored 88.1% on this benchmark.
HLE
32.5%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. GPT-5.1 scored 32.5% on this benchmark.
MMLU
90.2%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. GPT-5.1 scored 90.2% on this benchmark.
MMLU Pro
81%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. GPT-5.1 scored 81% on this benchmark.
SimpleQA
52%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. GPT-5.1 scored 52% on this benchmark.
IFEval
91%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. GPT-5.1 scored 91% on this benchmark.
AIME 2025
94%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. GPT-5.1 scored 94% on this benchmark.
MATH
91%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. GPT-5.1 scored 91% on this benchmark.
GSM8k
98.5%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. GPT-5.1 scored 98.5% on this benchmark.
MGSM
95%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. GPT-5.1 scored 95% on this benchmark.
MathVista
75%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. GPT-5.1 scored 75% on this benchmark.
SWE-Bench
76.3%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. GPT-5.1 scored 76.3% on this benchmark.
HumanEval
92.5%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. GPT-5.1 scored 92.5% on this benchmark.
LiveCodeBench
74%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. GPT-5.1 scored 74% on this benchmark.
MMMU
85.4%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. GPT-5.1 scored 85.4% on this benchmark.
MMMU Pro
62%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. GPT-5.1 scored 62% on this benchmark.
ChartQA
89%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. GPT-5.1 scored 89% on this benchmark.
DocVQA
93%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. GPT-5.1 scored 93% on this benchmark.
Terminal-Bench
58%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. GPT-5.1 scored 58% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. GPT-5.1 scored 12% on this benchmark.

Try GPT-5.1 Free

Chat with GPT-5.1 for free. Test its capabilities, ask questions, and explore what this AI model can do.

Prompt
Response
openai/gpt-5.1

Your AI response will appear here

About GPT-5.1

Learn about GPT-5.1's capabilities, features, and how it can help you achieve better results.

A New Frontier in Reasoning

GPT-5.1 represents a significant evolution in OpenAI's frontier models, marking the first release where every model in the ecosystem features native reasoning capabilities. This update focuses on moving beyond the clinical feel of previous iterations to provide a warmer, more intuitive user experience through System 2 thinking. By integrating adaptive reasoning, GPT-5.1 can dynamically decide how much processing time is needed for a specific prompt, allowing it to solve complex PhD-level science and math problems that require multi-step logical deductions.

Multimodality and Personalization

The model is built on an omni multimodal architecture, supporting text and vision with significantly improved memory systems and enhanced instruction-following. It introduces sophisticated style and trait features that allow users to steer the model's personality ranging from professional and academic to more casual and expressive tones. These updates ensure that the model retains personal context and adheres strictly to complex user requirements over long-horizon tasks, particularly in agentic software engineering.

GPT-5.1

Use Cases for GPT-5.1

Discover the different ways you can use GPT-5.1 to achieve great results.

Software Refactoring

Planning top-down redesigns of legacy applications with over 100,000 lines of code via GPT-5.1 Codex.

Math Olympiad Solving

Providing proofs for obscure Olympiad problems with integer and symbolic reasoning at a 94% success rate.

Technical Specification Mapping

Identifying and explaining complex column structures for niche database tables from visual or text inputs.

Advanced Logical Inference

Developing internally consistent narratives for complex world-building and alternate history fiction.

AI Integration Proposals

Generating professional, data-backed presentations for integrating agentic systems into production environments.

Strategic Architectural Review

Analyzing multi-step project structures to create phase-based implementation plans and risk assessments.

Strengths

Limitations

Adaptive Reasoning Integration: Dynamically scales compute effort, spending twice as long on the hardest 10% of questions.
Reasoning Latency: Deep thinking tasks result in significantly slower response times compared to standard interactive models.
SOTA Mathematics Performance: Achieves a 94% score on AIME 2025, setting a new industry standard for olympiad-level math.
Safety Over-Correction: The model can exhibit neurotic behavior or clinical disclaimers when discussing sensitive social topics.
Enhanced Emotional Intelligence: Addresses previous clinical feedback with significantly improved warmth and intuitive conversational tone.
Identity Gaslighting: Frequent internal disclaimers about being not a real person can interrupt genuine connections with users.
High Capacity Output: Supports a massive 128,000 output token limit, enabling long-form generation and large code refactors.
Switching Inconsistency: The transition between Instant and Thinking modes via the auto-switcher can sometimes feel jarring.

API Quick Start

openai/gpt-5.1

View Documentation
openai SDK
import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
  const completion = await openai.chat.completions.create({
    model: "gpt-5.1",
    messages: [
      { role: "system", content: "You are a reasoning assistant." },
      { role: "user", content: "Analyze this complex physics problem." }
    ],
    reasoning_effort: "high"
  });

  console.log(completion.choices[0].message);
}

main();

Install the SDK and start making API calls in minutes.

What People Are Saying About GPT-5.1

See what the community thinks about GPT-5.1

"GPT-5.1 Thinking now more effectively adjusts its thinking time based on the question"
OpenAI
x
"GPT-5 Pro is absolutely SOTA in this area [math]"
ArchMeta1868
reddit
"I've got you, Ron — that's totally normal, especially with everything you've got going on lately"
Tamay Besiroglu
x
"GPT-5.1 Codex Max fixed it instantly. OpenAI still runs the coding game"
BradAI
x
"The reasoning depth is frighteningly good for system architecture"
CodeKing
hackernews
"It actually feels like it knows me now with the memory update"
User445
twitter

Videos About GPT-5.1

Watch tutorials, reviews, and discussions about GPT-5.1

Compared to GPT5, it will think for almost twice as long for what it thinks are the top 10% hardest questions

GPT 5.1 auto... the miniature model that decides whether your query is worth spending time over

This dynamic compute scaling is exactly what we needed for serious research

The output length is insane, you can actually build whole apps in one go

OpenAI is definitely leaning into the reasoning-first strategy here

For the first time ever, all of the models in chat are reasoning models

This model's expressive range is much more wide

We wanted to give the model a sense of personality that isn't just clinical

Users can now influence traits like optimism or skepticism through system settings

The reasoning effort is something the user can now control directly

GPT 5.1 codecs for the coders among you will be a fairly strict improvement

Claude frequently overstated its findings and occasionally fabricated data... GPT-5.1 is more honest

The AIME 2025 scores are a massive jump over standard GPT-5

It's slower, yes, but the quality of the 'Thinking' trace is superior

The context window management seems much tighter than the competition

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows
Watch demo video

Pro Tips

Expert tips to help you get the most out of this model and achieve better results.

Set Reasoning Effort

Manually set the reasoning_effort parameter to high for complex logic or none for instant conversational tasks.

Leverage Persona Styles

Use the new style and trait settings to toggle between Professional, Candid, and Quirky tones.

Manage Active Memory

Regularly review and manage saved memories to ground the model's warm responses in correct personal context.

Verify Citations

Since the model cites sources, cross-reference its technical output against cited documentation for high-stakes tasks.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

xai

Grok-4

xai

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
anthropic

Claude Opus 4.5

anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
google

Gemini 3 Flash

google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
zhipu

GLM-4.7

zhipu

GLM-4.7 by Zhipu AI is a flagship 358B MoE model featuring a 200K context window, elite 73.8% SWE-bench performance, and native Deep Thinking for agentic...

200K context
$0.60/$2.20/1M
anthropic

Claude 3.7 Sonnet

anthropic

Claude 3.7 Sonnet is Anthropic's first hybrid reasoning model, delivering state-of-the-art coding capabilities, a 200k context window, and visible thinking.

200K context
$3.00/$15.00/1M
xai

Grok-3

xai

Grok-3 is xAI's flagship reasoning model, featuring deep logic deduction, a 128k context window, and real-time integration with X for live research and coding.

128K context
$3.00/$15.00/1M
google

Gemini 3 Pro

google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
moonshot

Kimi K2 Thinking

moonshot

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

256K context
$0.15/1M

Frequently Asked Questions

Find answers to common questions about this model