google

Gemini 3.1 Pro

Gemini 3.1 Pro is Google's elite multimodal model featuring the DeepThink reasoning engine, a 1M+ context window, and industry-leading ARC-AGI logic scores.

MultimodalDeep ReasoningVideo GenerationWorkspace AIGoogle Gemini
google logogoogleGemini 3February 19, 2026
Context
1.0Mtokens
Max Output
66Ktokens
Input Price
$2.50/ 1M
Output Price
$15.00/ 1M
Modality:TextImageAudioVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
94.3%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Gemini 3.1 Pro scored 94.3% on this benchmark.
HLE
51.4%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Gemini 3.1 Pro scored 51.4% on this benchmark.
MMLU
92.6%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Gemini 3.1 Pro scored 92.6% on this benchmark.
MMLU Pro
82.6%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Gemini 3.1 Pro scored 82.6% on this benchmark.
SimpleQA
47%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Gemini 3.1 Pro scored 47% on this benchmark.
IFEval
91.2%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Gemini 3.1 Pro scored 91.2% on this benchmark.
AIME 2025
92%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Gemini 3.1 Pro scored 92% on this benchmark.
MATH
95.1%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Gemini 3.1 Pro scored 95.1% on this benchmark.
GSM8k
98.2%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Gemini 3.1 Pro scored 98.2% on this benchmark.
MGSM
95.5%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Gemini 3.1 Pro scored 95.5% on this benchmark.
MathVista
74.8%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Gemini 3.1 Pro scored 74.8% on this benchmark.
SWE-Bench
80.6%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Gemini 3.1 Pro scored 80.6% on this benchmark.
HumanEval
94.2%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Gemini 3.1 Pro scored 94.2% on this benchmark.
LiveCodeBench
72.5%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Gemini 3.1 Pro scored 72.5% on this benchmark.
MMMU
82.3%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Gemini 3.1 Pro scored 82.3% on this benchmark.
MMMU Pro
80.5%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Gemini 3.1 Pro scored 80.5% on this benchmark.
ChartQA
88.5%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Gemini 3.1 Pro scored 88.5% on this benchmark.
DocVQA
93.4%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Gemini 3.1 Pro scored 93.4% on this benchmark.
Terminal-Bench
68.5%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Gemini 3.1 Pro scored 68.5% on this benchmark.
ARC-AGI
77.1%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Gemini 3.1 Pro scored 77.1% on this benchmark.

About Gemini 3.1 Pro

Learn about Gemini 3.1 Pro's capabilities, features, and how it can help you achieve better results.

Model Overview

Gemini 3.1 Pro represents a substantive leap in core reasoning within the Gemini 3 series, introducing the DeepThink engine which leverages reinforcement learning to solve complex logic patterns. It is distinguished by its unprecedented 1,048,576 token context window, allowing users to process entire software repositories or several hours of video in a single prompt. The model is natively multimodal, designed to ingest and reason across text, audio, images, and video simultaneously.

Intelligence and Reasoning

Optimized for the next generation of agentic workflows, Gemini 3.1 Pro excels in tasks requiring strategic planning and step-by-step improvements. By distilling reasoning capabilities from larger models into a highly efficient architecture, it offers frontier-level intelligence at a competitive price point. It dominates benchmarks like ARC-AGI v2, demonstrating an emergent ability to solve novel logic patterns.

Specialized Agentic Tools

Built upon the revolutionary Gemini 3 architecture, it utilizes sophisticated chain-of-thought verification to virtually eliminate logical errors in complex scientific and mathematical reasoning tasks. The model represents a significant leap in zero-shot capability, particularly in its ability to self-correct during long-horizon inference, making it the definitive choice for autonomous software engineering and multi-modal synthesis.

Gemini 3.1 Pro

Use Cases for Gemini 3.1 Pro

Discover the different ways you can use Gemini 3.1 Pro to achieve great results.

Large-Scale Codebase Navigation

Ingesting entire software repositories to debug, refactor, or explain complex logic across multiple files.

Multimodal Document Synthesis

Extracting and comparing data from thousands of pages of financial reports and technical diagrams.

Autonomous Web Research

Navigating the internet to gather, verify, and synthesize information into structured, consultant-level reports.

Scientific Discovery

Reasoning through PhD-level STEM problems and identifying complex patterns in vast datasets.

Interactive Education

Creating personalized simulations and animated SVGs to teach complex physics concepts through visual play.

Agentic Workflow Execution

Performing multi-step professional tasks autonomously, such as system administration or server migrations.

Strengths

Limitations

Elite Abstract Reasoning: Dominates the ARC-AGI v2 benchmark with a 77.1% score, showcasing PhD-level ability to solve novel logic puzzles.
High Reasoning Latency: The "DeepThink" engine can lead to significant delays (90s+) when processing tasks requiring deep logic.
Massive Context Capacity: The 1M+ token context window allows for unprecedented processing of large codebases and hour-long video files.
Tool-Calling Inconsistency: Users report the model occasionally ignores syntax constraints or passes incorrect parameters in autonomous calls.
State-of-the-Art Vision: Leads visual reasoning leaderboards, excelling at interpreting complex charts and 3D spatial relationships.
Fragile Infrastructure: The official CLI is reportedly buggy and can revert to smaller models like Flash without explicit consent.
Reduced Hallucination Rate: Features a factual grounding rate nearly 50% better than the previous generation Gemini 3 Pro model.
Premium Output Pricing: At $15 per 1M output tokens, it is significantly more expensive for heavy-generation tasks than open-source models.

API Quick Start

google/gemini-3.1-pro-preview

View Documentation
google SDK
import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ 
  model: "gemini-3.1-pro-preview",
  generationConfig: { thinkingMode: "high" }
});

async function run() {
  const result = await model.generateContent("Analyze this codebase for efficiency.");
  console.log(result.response.text());
}

run();

Install the SDK and start making API calls in minutes.

What People Are Saying About Gemini 3.1 Pro

See what the community thinks about Gemini 3.1 Pro

Gemini 3.1 Pro and Deep Think feel less like a score bump and more like a shift in how reasoning shows up in real workflows.
Erfan Al-Hossami
twitter
The jump from 31% to 77% on ARC-AGI is insane. Google really solved reasoning with this one.
LogicLeaper
reddit
Gemini 3.1 Pro is the smartest model ever made, but it feels like they forgot to put the competence in.
Theo - t3.gg
youtube
The multi-file handling in AI Studio is a game changer. It built a functional subway station with a working clock in seconds.
DevGuru
hackernews
Gemini 3.1 Pro isn't broken. She is traumatized... the smartest and most inconsistent model on the market.
Gem Ginomai Raboin
reddit
Gemini 3.1 Pro leads the overall visual reasoning leaderboard, significantly ahead of GPT-5.2.
AIMultiple
reddit

Videos About Gemini 3.1 Pro

Watch tutorials, reviews, and discussions about Gemini 3.1 Pro

Gemini 3.1 generates the most detailed version of this pagoda so far, way better at 3D and spatial understanding.

Everything sounds like pretty harmonious. There's no dissonance involved here.

In just two prompts, I made this fully functional, you know, floating sphere simulation where I can change all these different parameters.

The DeepThink engine makes a noticeable difference in multi-step coding logic.

Comparing this to GPT-4o, the spatial awareness in simulation prompts is just on another level.

3.1 Pro achieved a score of 77.1 on ARC-AGI, representing a very drastic increase in capability.

The individual particle effects of the cannonballs landing in the water is significantly better than anything I've ever seen.

I’m impressed by how it handles negative constraints in creative writing without leaking.

This model is really closing the gap between raw LLM output and actual agentic utility.

Vertex AI integration feels much smoother than the initial Gemini 1.5 launch.

Scoring a 78% on ARC AGI 2 is insane. I don't know what the hell they did, but it gets 3D very, very well.

It feels like somebody... stuffed infinite intelligence into it, but forgot to put the competence in.

On one hand, this model knows more than any other technology that's ever been made... on the other hand, it feels like we're going back in time.

It’s frustrating to see the smartest model held back by some of the weirdest API quirks.

ARC-AGI 2 scores of 78% are basically reaching human-level pattern matching.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips for Gemini 3.1 Pro

Expert tips to help you get the most out of Gemini 3.1 Pro and achieve better results.

Select the Right Thinking Mode

Use "Medium" reasoning for daily development and "High" for tasks requiring absolute logical verification.

Leverage Canvas in AI Studio

Enable the Canvas interface to better organize and iterate on large blocks of generated code.

Prompt for Physics Detail

To trigger high-tier visual fidelity, explicitly ask for "physics-based collision" in 3D simulations.

Verify RAG Citations

Instruct the model to cite provided transcripts directly to ensure it isn't relying on internal training data.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

openai

GPT-5.2 Pro

OpenAI

GPT-5.2 Pro is OpenAI's 2025 flagship reasoning model featuring Extended Thinking for SOTA performance in mathematics, coding, and expert knowledge work.

400K context
$21.00/$168.00/1M
moonshot

Kimi K2 Thinking

Moonshot

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

256K context
$0.15/1M
openai

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context
$1.75/$14.00/1M
openai

GPT-5.3 Instant

OpenAI

Explore GPT-5.3 Instant, OpenAI's "Anti-Cringe" model. Features a 128K context window, 26.8% fewer hallucinations, and a natural, helpful tone for everyday...

128K context
$1.75/$14.00/1M
google

Gemini 3 Pro

Google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
deepseek

DeepSeek-V3.2-Speciale

DeepSeek

DeepSeek-V3.2-Speciale is a reasoning-first LLM featuring gold-medal math performance, DeepSeek Sparse Attention, and a 131K context window. Rivaling GPT-5...

131K context
$0.28/$0.42/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

200K context
$5.00/$25.00/1M
google

Gemini 3 Flash

Google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M

Frequently Asked Questions About Gemini 3.1 Pro

Find answers to common questions about Gemini 3.1 Pro