How much does Gemini 3.1 Flash-Lite cost?

It is priced at $0.25 per 1 million input tokens and $1.50 per 1 million output tokens.

Is there a free tier for developers?

Yes, it is available for free in preview through Google AI Studio for testing and experimentation.

What is the maximum context window?

The model supports up to 1,048,576 tokens, allowing for the ingestion of approximately 700,000 words.

Can I process video files with this model?

Yes, it can natively process video files up to 1 hour in length or 1.5GB in size.

What are Thinking Levels?

This parameter lets you control the internal reasoning time the model spends on a problem before generating output.

How does it compare to Claude 4.5 Haiku?

Gemini 3.1 Flash-Lite is roughly 4x cheaper on output tokens while outperforming Haiku on GPQA reasoning benchmarks.

Does it support function calling?

Yes, it has full support for tool use and function calling for building autonomous agentic workflows.

What is the model output speed?

The model reaches speeds of 363 tokens per second, making it ideal for latency-sensitive applications.

Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

MultimodalHigh SpeedCost EfficientGoogle Gemini

googleGemini 3.1March 3, 2026

Context

1.0Mtokens

Max Output

66Ktokens

Input Price

$0.25/ 1M

Output Price

$1.50/ 1M

Modality:TextImageAudioVideo

Capabilities:VisionToolsStreamingReasoning

Benchmarks

GPQA

86.9%

HLE

25%

MMLU

89.2%

MMLU Pro

83%

SimpleQA

43.3%

IFEval

90.1%

AIME 2025

75%

MATH

88.5%

GSM8k

96.2%

MGSM

92%

MathVista

64.1%

SWE-Bench

52%

HumanEval

90.5%

LiveCodeBench

72%

MMMU

76.8%

MMMU Pro

76.8%

ChartQA

85.5%

DocVQA

92.2%

Terminal-Bench

51%

ARC-AGI

8.5%

View API Documentation

About Gemini 3.1 Flash-Lite

Learn about Gemini 3.1 Flash-Lite's capabilities, features, and how it can help you achieve better results.

Gemini 3.1 Flash-Lite is engineered for high-volume AI applications where processing speed is the primary technical requirement. Unlike larger Pro models, Flash-Lite uses a streamlined architecture that prioritizes throughput, reaching 363 tokens per second. It serves as a specialized tool for developers building real-time voice agents, automated content moderation systems, and large-scale data extraction pipelines that must remain cost-effective under heavy traffic.

Despite its lite designation, the model maintains a 1 million token context window. It can ingest raw audio files, hour-long videos, and hundreds of pages of PDFs in a single request. By introducing Thinking Levels, Google allows users to choose between near-instant responses for simple tasks and a deeper reasoning phase for complex logic. This provides multiple performance profiles within a single API endpoint to balance cost and accuracy.

The model is natively multimodal, which eliminates the need for external tools to transcribe audio or describe images before processing. This native capability improves performance on visual tasks like document question answering and chart analysis. Developers can use the thinking_level parameter to adjust internal reasoning time, effectively scaling the model's effort based on the specific complexity of each query.

Use Cases

Discover the different ways you can use Gemini 3.1 Flash-Lite to achieve great results.

High-Volume Translation

Processing thousands of multilingual chat messages or support tickets in real-time with sub-second latency.

Intelligent Model Routing

Acting as a fast classifier to determine if incoming queries need to be escalated to more expensive models.

Multimodal Content Moderation

Scanning large batches of user-generated images and videos for safety compliance at low cost.

Real-Time UI Prototyping

Generating functional React or Tailwind components from hand-drawn wireframes or verbal descriptions.

Long-Document Summarization

Condensing massive legal archives or technical manuals without losing context across the 1M token window.

Live Audio Transcription

Converting hours of meetings or lecture recordings into structured summaries and action items in one pass.

Strengths

Limitations

Blistering Performance: At 363 tokens per second, it is one of the fastest models in the industry for real-time responsiveness.

Low Factual Recall: A SimpleQA score of 43.3% indicates a high risk of hallucinations for general knowledge without grounding.

Advanced Reasoning: Achieving 86.9% on GPQA Diamond, it provides PhD-level scientific logic in a lightweight tier.

Price Increase: It is significantly more expensive than the Gemini 2.5 Flash-Lite predecessor it replaces in the lineup.

Dynamic Cost Control: The Thinking Levels parameter allows for granular control over compute spend on a per-request basis.

Higher Latency in High-Thinking: Using the high thinking level adds roughly 7 to 10 seconds of pre-computation before generation begins.

Unified Multimodality: Native ingestion of audio, video, and PDFs eliminates the need for complex multi-model orchestration pipelines.

Safety Refusals: Internal testing shows a 21.7% drop in image-to-text safety consistency during red-teaming exercises.

API Quick Start

google/gemini-3.1-flash-lite-preview

View Documentation

google SDK

import { GoogleGenAI } from "@google/generative-ai";

const genAI = new GoogleGenAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({
  model: "gemini-3.1-flash-lite-preview",
  generationConfig: {
    thinkingConfig: { thinking_level: "high" }
  }
});

const result = await model.generateContent("Create a weather dashboard UI.");
console.log(result.response.text());

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Gemini 3.1 Flash-Lite

“The coding capability of 3.1 Flash-Lite is surprisingly good for front-end development; it coded a 360-degree viewer perfectly.”

— WorldofAI

youtube

“Gemini 3.1 Flash-Lite is the model to build always-on multimodal AI Agents. It reads, connects, and consolidates everything.”

— Shubham Saboo

twitter

“Pricing is a massive shock. A 3.75x jump on output tokens is going to sting if you're on a tight cloud budget.”

— Binary Verse AI

youtube

“It shifts the burden of complexity from your engineering team's architecture right onto Google's infrastructure.”

— Julian Goldie

youtube

“Another price drop for intelligence. High speed, low cost, high intelligence. A great model for agentic routing.”

— ctgtplb

twitter

“The 1M context is still the killer feature here. I can dump entire repo folders and it just works with sub-second TTFT.”

— DevFlow_26

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents

Web Automation

Smart Workflows

Get Started Free

Pro Tips

Expert tips to help you get the most out of Gemini 3.1 Flash-Lite and achieve better results.

Set Thinking Levels

Use minimal thinking for classification to reduce costs but switch to high for complex coding tasks.

Enable Grounding

Always use Google Search grounding for tasks requiring factual recall since base factual accuracy is lower.

Upload Raw Files

Avoid pre-processing audio or video into text and instead upload raw files to leverage native multimodality.

Use System Instructions

Strictly enforce JSON schemas using the system_instruction parameter to minimize output correction tokens.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Mohammed Ibrahim

CEO, qannas.pro

Ben Bressington

CTO, AiChatSolutions

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context

$5.00/$25.00/1M

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context

$3.00/$15.00/1M

GLM-5.1

Zhipu (GLM)

GLM-5.1 is Zhipu AI's flagship reasoning model, featuring a 202K context window and an autonomous 8-hour execution loop for complex agentic engineering.

203K context

$1.40/$4.40/1M

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

256K context

$0.60/$3.00/1M

Qwen3.6-Max-Preview

alibaba

Qwen3.6-Max-Preview is Alibaba's flagship MoE model featuring 1M context, a native thinking mode, and SOTA scores in agentic coding and reasoning.

1M context

$1.25/$10.00/1M

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context

$1.00/$3.20/1M

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context

$1.25/$10.00/1M

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context

$1.75/$14.00/1M

Frequently Asked Questions

Find answers to common questions about Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite

About Gemini 3.1 Flash-Lite

Use Cases

High-Volume Translation

Intelligent Model Routing

Multimodal Content Moderation

Real-Time UI Prototyping

Long-Document Summarization

Live Audio Transcription

Strengths

Limitations

API Quick Start

Community Feedback

Related Videos

Supercharge your workflow with AI Automation

Pro Tips

Set Thinking Levels

Enable Grounding

Upload Raw Files

Use System Instructions

What Our Users Say

Related AI Models

Claude Opus 4.5

Grok-4

GLM-5.1

Kimi K2.5

Qwen3.6-Max-Preview

GLM-5

GPT-5.1

GPT-5.2

Frequently Asked Questions

How much does Gemini 3.1 Flash-Lite cost?

Is there a free tier for developers?

What is the maximum context window?

Can I process video files with this model?

What are Thinking Levels?

How does it compare to Claude 4.5 Haiku?

Does it support function calling?

What is the model output speed?