google

Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

MultimodalHigh SpeedCost EfficientGoogle Gemini
google logogoogleGemini 3.12026-03-03
Context
1.0Mtokens
Max Output
66Ktokens
Input Price
$0.25/ 1M
Output Price
$1.50/ 1M
Modality:TextImageAudioVideo
Capabilities:VisionToolsStreaming
Benchmarks
GPQA
86.9%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Gemini 3.1 Flash-Lite scored 86.9% on this benchmark.
HLE
16%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Gemini 3.1 Flash-Lite scored 16% on this benchmark.
MMLU
88.9%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Gemini 3.1 Flash-Lite scored 88.9% on this benchmark.
MMLU Pro
80%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Gemini 3.1 Flash-Lite scored 80% on this benchmark.
SimpleQA
43.3%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Gemini 3.1 Flash-Lite scored 43.3% on this benchmark.
IFEval
85%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Gemini 3.1 Flash-Lite scored 85% on this benchmark.
AIME 2025
25%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Gemini 3.1 Flash-Lite scored 25% on this benchmark.
MATH
78%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Gemini 3.1 Flash-Lite scored 78% on this benchmark.
GSM8k
95%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Gemini 3.1 Flash-Lite scored 95% on this benchmark.
MGSM
92%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Gemini 3.1 Flash-Lite scored 92% on this benchmark.
MathVista
75%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Gemini 3.1 Flash-Lite scored 75% on this benchmark.
SWE-Bench
35%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Gemini 3.1 Flash-Lite scored 35% on this benchmark.
HumanEval
88%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Gemini 3.1 Flash-Lite scored 88% on this benchmark.
LiveCodeBench
72%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Gemini 3.1 Flash-Lite scored 72% on this benchmark.
MMMU
76.8%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Gemini 3.1 Flash-Lite scored 76.8% on this benchmark.
MMMU Pro
76.8%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Gemini 3.1 Flash-Lite scored 76.8% on this benchmark.
ChartQA
91%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Gemini 3.1 Flash-Lite scored 91% on this benchmark.
DocVQA
92%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Gemini 3.1 Flash-Lite scored 92% on this benchmark.
Terminal-Bench
55%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Gemini 3.1 Flash-Lite scored 55% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Gemini 3.1 Flash-Lite scored 12% on this benchmark.

About Gemini 3.1 Flash-Lite

Learn about Gemini 3.1 Flash-Lite's capabilities, features, and how it can help you achieve better results.

Optimized for High-Speed Intelligence

Gemini 3.1 Flash-Lite is Google’s high-speed workhorse model, designed specifically for high-volume developer workloads where low latency and cost efficiency are paramount. Released on March 3, 2026, it serves as an optimized entry in the Gemini 3.1 series, delivering 2.5x faster time-to-first-token and a 45% increase in output speed compared to previous generations. It is capable of streaming over 360 tokens per second, making it ideal for real-time applications and massive-scale data processing.

Natively Multimodal with 1M Context

The model is natively multimodal, supporting text, image, audio, video, and PDF inputs within a massive 1 million-token context window. This allows developers to process enormous datasets, such as hour-long videos or massive legal archives, without the need for complex RAG pipelines. Its vision capabilities are particularly strong, excelling at document visual question answering and chart analysis.

Granular Developer Control

A standout feature is the introduction of 'Thinking Levels' (Minimal, Low, Medium, High). This parameter allows developers to granularly dial the model's reasoning depth up or down based on the task's complexity. This flexibility ensures that users don't overpay for simple tasks like classification while still having access to enhanced logic for more structured outputs like UI generation and data extraction.

Gemini 3.1 Flash-Lite

Use Cases for Gemini 3.1 Flash-Lite

Discover the different ways you can use Gemini 3.1 Flash-Lite to achieve great results.

High-Volume Real-Time Translation

Seamlessly process thousands of chat messages or support tickets across 100+ languages with minimal latency and high cost-efficiency.

Multimodal Content Moderation

Utilize native video and image processing to flag inappropriate content in high-throughput social media feeds or video platforms.

Automated Structured Data Extraction

Extract complex JSON schemas from massive PDF archives or long-form legal documents using the 1M token context window.

Agile Front-End Prototyping

Rapidly generate functional React/Tailwind UI components and landing pages at over 360 tokens per second for iterative design.

Agentic Task Orchestration

Power 'always-on' AI agents that perform multi-step planning, web research, and tool use without breaking the token budget.

Low-Latency Customer Service Bots

Deploy conversational assistants that provide instantaneous responses with adjustable reasoning for simple vs. complex queries.

Strengths

Limitations

Unmatched Throughput: Streams at 363 tokens per second, making it 45% faster than 2.5 Flash for real-time agentic applications.
Reasoning Ceiling: Significantly lower performance on abstract logic (12% ARC-AGI v2) compared to flagship reasoning-specific models.
Aggressive Pricing: At $0.25/M input tokens, it is roughly 1/8th the cost of Gemini 3.1 Pro while maintaining high general intelligence.
Math Olympiad Gaps: Struggles with elite-level mathematics, scoring only 25% on AIME 2025 compared to 90%+ for frontier models.
Native Multimodal Mastery: Exceptional performance on vision (92% DocVQA) and video (84.8% VideoMMMU) without requiring separate encoders.
Factuality Calibration: Faces higher hallucination rates in fact-seeking tasks (43.3% SimpleQA) than Pro-tier or frontier alternatives.
Granular Compute Control: The first model to offer precise control over reasoning depth, enabling optimization of the cost-to-performance ratio.
Instruction Drift: Can occasionally miss minor formatting constraints in extremely long, complex multi-step instructions.

API Quick Start

google/gemini-3.1-flash-lite-preview

View Documentation
google SDK
import { GoogleGenAI } from '@google/genai';

const genAI = new GoogleGenAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({ 
  model: 'gemini-3.1-flash-lite-preview',
  thinkingConfig: { thinking_level: 'low' }
});

async function generate() {
  const prompt = "Extract key entities from this document.";
  const result = await model.generateContent(prompt);
  console.log(result.response.text());
}

generate();

Install the SDK and start making API calls in minutes.

What People Are Saying About Gemini 3.1 Flash-Lite

See what the community thinks about Gemini 3.1 Flash-Lite

Flash lite is crazy fast and effective for specific workflows like summarization... this is a welcome speed jump.
reddit user
reddit
Gemini 3.1 Flash-Lite is the quiet kill shot for mid-tier API providers... the cost curves compound fast.
@9chaku
twitter
3.1 Flash-Lite outperforms 2.5 Flash across a majority of benchmarks while being a little speedster!
Tulsee Doshi
twitter
For builders running AI agents at scale, this is the model that makes 'always-on' actually affordable. 363 t/s is wild.
@prince_twets
twitter
The pricing is insane. $0.25 for 1M input makes it cheaper to just feed entire repos into context than build RAG.
reddit user
reddit
The speed to first token is basically instant. It's the first time a model has felt faster than my own typing.
DevGuru
hackernews

Videos About Gemini 3.1 Flash-Lite

Watch tutorials, reviews, and discussions about Gemini 3.1 Flash-Lite

Pricing comes in at 25 cents per 1 million input tokens and $1.50 per 1 million output tokens... still quite competitive considering the speed.

I am finding this model to be an underrated coding model focusing on front-end development and it delivers extremely fast tokens.

This is really targeting the developer who needs scale without the latency of a Pro model.

The multimodality here isn't just a gimmick; it's handling complex PDFs with ease.

Google is really pushing the boundary of what a 'lite' model can actually achieve in 2026.

This time, it's Gemini 3.1 Flash Light, which is supposed to be a faster and less expensive version of the Flash model.

These models are needed because you want to use them in application where you need high throughput.

The 1 million context window is standard now for Gemini, but seeing it on a model this fast is impressive.

It's not going to win a math olympiad, but it's perfect for extraction and summarization.

The API latency is significantly lower than GPT-4o-mini in my early testing.

This new AI model from Google is 45% faster... and it might just change how every single one of us builds with AI.

Low thinking mode for the quick, easy stuff. High thinking mode for the heavy lifting... that flexibility is what separates a toy from a real tool.

For SEO tasks, this is going to be my daily driver because of the price point.

The fact that it can see a video and understand the context almost instantly is a game changer for content creators.

Google is making it very hard to justify using other providers for high-volume tasks right now.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips for Gemini 3.1 Flash-Lite

Expert tips to help you get the most out of Gemini 3.1 Flash-Lite and achieve better results.

Leverage Thinking Levels

Set thinking_level to 'minimal' for simple tasks like classification to maximize speed, but use 'high' for structured code generation.

Native Video Analysis

Feed raw video files directly into the API for faster insights on visual events and audio cues simultaneously, bypassing transcript steps.

Context Over RAG

For datasets under 1M tokens, feed the entire document set into the context window to eliminate retrieval errors and vector DB costs.

Optimize with Batching

Use the batching API for non-urgent tasks to further reduce costs, as Flash-Lite is specifically optimized for asynchronous processing.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

262K context
$0.60/$2.50/1M
openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
zhipu

GLM-4.7

Zhipu (GLM)

GLM-4.7 by Zhipu AI is a flagship 358B MoE model featuring a 200K context window, elite 73.8% SWE-bench performance, and native Deep Thinking for agentic...

200K context
$0.60/$2.20/1M
alibaba

Qwen3.5-397B-A17B

alibaba

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

1M context
$0.60/$3.60/1M
anthropic

Claude 3.7 Sonnet

Anthropic

Claude 3.7 Sonnet is Anthropic's first hybrid reasoning model, delivering state-of-the-art coding capabilities, a 200k context window, and visible thinking.

200K context
$3.00/$15.00/1M
xai

Grok-3

xAI

Grok-3 is xAI's flagship reasoning model, featuring deep logic deduction, a 128k context window, and real-time integration with X for live research and coding.

128K context
$3.00/$15.00/1M

Frequently Asked Questions About Gemini 3.1 Flash-Lite

Find answers to common questions about Gemini 3.1 Flash-Lite