google

Gemini 3.1 Pro

Gemini 3.1 Pro is Google's elite multimodal model featuring the DeepThink reasoning engine, a 1M+ context window, and industry-leading ARC-AGI logic scores.

MultimodalDeep ReasoningVideo GenerationWorkspace AIGoogle Gemini
google logogoogleGeminiFebruary 19, 2026
Context
1.0Mtokens
Max Output
66Ktokens
Input Price
$2.00/ 1M
Output Price
$12.00/ 1M
Modality:TextImageAudioVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
94.3%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Gemini 3.1 Pro scored 94.3% on this benchmark.
HLE
44.4%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Gemini 3.1 Pro scored 44.4% on this benchmark.
MMLU
98%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Gemini 3.1 Pro scored 98% on this benchmark.
MMLU Pro
90.5%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Gemini 3.1 Pro scored 90.5% on this benchmark.
SimpleQA
72.1%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Gemini 3.1 Pro scored 72.1% on this benchmark.
IFEval
90%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Gemini 3.1 Pro scored 90% on this benchmark.
AIME 2025
95%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Gemini 3.1 Pro scored 95% on this benchmark.
MATH
95%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Gemini 3.1 Pro scored 95% on this benchmark.
GSM8k
98%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Gemini 3.1 Pro scored 98% on this benchmark.
MGSM
95%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Gemini 3.1 Pro scored 95% on this benchmark.
MathVista
75%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Gemini 3.1 Pro scored 75% on this benchmark.
SWE-Bench
80.6%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Gemini 3.1 Pro scored 80.6% on this benchmark.
HumanEval
94%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Gemini 3.1 Pro scored 94% on this benchmark.
LiveCodeBench
75%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Gemini 3.1 Pro scored 75% on this benchmark.
MMMU
81%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Gemini 3.1 Pro scored 81% on this benchmark.
MMMU Pro
81%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Gemini 3.1 Pro scored 81% on this benchmark.
ChartQA
90%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Gemini 3.1 Pro scored 90% on this benchmark.
DocVQA
92%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Gemini 3.1 Pro scored 92% on this benchmark.
Terminal-Bench
68.5%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Gemini 3.1 Pro scored 68.5% on this benchmark.
ARC-AGI
77.1%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Gemini 3.1 Pro scored 77.1% on this benchmark.

About Gemini 3.1 Pro

Learn about Gemini 3.1 Pro's capabilities, features, and how it can help you achieve better results.

Gemini 3.1 Pro represents a mature execution of the Sparse Mixture-of-Experts (MoE) framework, natively paired with an advanced multimodal processing engine. The architecture's standout feature is the democratization of the DeepThink System 2 layer, which permits the model to deliberate internally before committing to an output token. This model introduces a unique three-tier thinking system, Low, Medium, and High, allowing developers to explicitly control the trade-off between latency, cost, and reasoning depth.

With a massive 1-million-token context window, Gemini 3.1 Pro is highly optimized for complex workflows in finance, data analytics, and whole-repository code migrations. It demonstrates an emergent ability to solve novel logic patterns, scoring an unprecedented 77.1% on the ARC-AGI-2 benchmark. This makes it a preferred choice for developers who require both low-latency multimodal interactions and high-level cognitive performance for autonomous agentic tasks.

Gemini 3.1 Pro

Use Cases

Discover the different ways you can use Gemini 3.1 Pro to achieve great results.

Whole-Repository Code Analysis

Utilizing the 1M context window to ingest entire software repositories for refactoring and dependency mapping.

Autonomous Agent Committees

Driving multi-step agentic workflows where internal sub-agents debate and verify solutions before execution.

Scientific Research Synthesis

Analyzing thousands of research papers and complex datasets to extract structured intelligence and factual insights.

Multimodal Content Creation

Simultaneously processing text, images, and audio to generate complex instructional materials and interactive media.

Terminal-Based Automation

Executing complex bash commands and manipulating file systems with high precision via advanced reasoning modes.

Enterprise Data Auditing

Parsing unstructured financial data and legal documents to identify compliance gaps with near-perfect factual recall.

Strengths

Limitations

ARC-AGI-2 Reasoning Leader: Scored 77.1% on ARC-AGI-2, more than doubling the reasoning capability of previous flagship models.
Large Context Pricing Penalty: Input and output prices double once a prompt exceeds the 200,000 token threshold, impacting massive batch jobs.
1M Token Context Window: Handles massive multi-file codebases and long-form video with state-of-the-art recall and low latency.
Extreme Output Verbosity: Benchmarks indicate the model can be overly verbose, generating significantly more tokens than required for simple tasks.
Competitive Pricing Strategy: Priced at $2/$12 per million tokens, making it significantly more affordable than Anthropic or OpenAI equivalents.
Nuanced Tone Challenges: Community feedback suggests the conversational tone can feel less natural or nuanced compared to Claude 3.5 series.
Granular Compute Tiers: Features a three-tier thinking system for precise developer control over internal reasoning depth and cost.
Inconsistent Reasoning Tiers: Reasoning quality varies significantly between tiers, often requiring manual experimentation to find the optimal setting.

API Quick Start

google/gemini-3.1-pro-preview

View Documentation
google SDK
import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({
  model: "gemini-3.1-pro-preview",
  thinkingConfig: { tier: "high" }
});

const prompt = "Analyze this entire codebase for security vulnerabilities.";
const result = await model.generateContent(prompt);
console.log(result.response.text());

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Gemini 3.1 Pro

Gemini 3.1 Pro's 77.1% score represents the most disruptive market shift; it more than doubles the previous high on ARC-AGI.
enoumen
reddit
The coding benchmarks don't lie. This model found a bug in my repo that 3.5 and GPT-4o missed completely.
SiliconValleyCoder
hackernews
The gemini 3.1 shitstorm is really interesting. It crushed benchmarks but real users are saying the tone and vibe are inconsistent.
cryptopunk7213
twitter
The DeepThink engine can lead to significant delays, sometimes over 90 seconds, when processing tasks requiring deep logic.
TechReviewer2026
youtube
Context caching is the killer feature here. I'm running an entire documentation bot for pennies compared to GPT-4o.
CloudArchitect
reddit
Gemini failed to discuss Python at all in a complex planning task... some logic was just not present in its final plan.
Temporary-Mix8022
reddit

Related Videos

Watch tutorials, reviews, and discussions about Gemini 3.1 Pro

Gemini 3.1 Pro generates the most detailed version of this pagoda so far

Gemini has by far the widest window of a million tokens

The multimodal fidelity in audio processing is noticeably better than 3.0

Token throughput remains stable even as the context window fills up

Long-term recall is basically perfect across the entire million tokens

On puzzles that shouldn't be in its training data, the Gemini 3 series outperforms all other models

3.1 Pro could indeed reduce the runtime of a fine-tuning script from 300 seconds to 47 seconds

DeepThink logic steps are clearly visible in the trace, showing real deliberation

We are reaching benchmark saturation where only ARC-AGI really matters for progress

The AGI trajectory is accelerating based on these abstract reasoning jumps

I do think that like 3.1 it genuinely feels like a step up, even if it's just very slight

It does seem to outperform Gemini 3.0 Pro when we test the exact same prompts side by side

Coding accuracy on complex Python refactors is the highest I have seen

API reliability has improved significantly over the last month of testing

Real-world performance finally matches the hype of the benchmark scores

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Gemini 3.1 Pro and achieve better results.

Reasoning Tier Selection

Use High thinking mode for complex math or logic, but switch to Low for standard formatting to save compute.

Context Caching

Implement context caching for static documentation to reduce input prices by up to 90% per million tokens.

Structured Artifacts

Leverage the model's ability to generate structured task lists for easier human supervision during agentic runs.

Multimodal Prompting

Combine video and audio inputs to give the model full context of real-world scenarios rather than text-only descriptions.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

google

Gemini 3.1 Flash Live Preview

Google

Gemini 3.1 Flash Live Preview is Google's ultra-low-latency, audio-to-audio model featuring a 131K context window, high-fidelity multimodal reasoning, and...

131K context
$0.75/$4.50/1M
xai

Grok-3

xAI

Grok-3 is xAI's flagship reasoning model, featuring deep logic deduction, a 128k context window, and real-time integration with X for live research and coding.

1M context
$3.00/$15.00/1M
openai

GPT-5.2 Pro

OpenAI

GPT-5.2 Pro is OpenAI's 2025 flagship reasoning model featuring Extended Thinking for SOTA performance in mathematics, coding, and expert knowledge work.

400K context
$21.00/$168.00/1M
google

Gemini 3 Pro

Google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

1M context
$5.00/$25.00/1M
google

Gemini 3 Flash

Google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
anthropic

Claude Sonnet 4.6

Anthropic

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

1M context
$3.00/$15.00/1M
alibaba

Qwen3.5-397B-A17B

alibaba

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

1M context
$0.40/$2.40/1M

Frequently Asked Questions

Find answers to common questions about Gemini 3.1 Pro