What is the context window for Gemini 3.1 Flash Live?

The model supports a 131,072-token input context window and a 65,536-token output window. This enables it to remember long conversations and process substantial document history during a live session.

How much does the API cost?

Text input is $0.75 per 1 million tokens and output is $4.50. Audio input costs approximately $0.005 per minute, while audio output costs $0.018 per minute.

Does this model support function calling?

Yes, Gemini 3.1 Flash Live supports synchronous function calling. The model pauses its audio response to execute the tool and waits for the tool output before continuing.

How does thinking work in this model?

Gemini 3.1 Flash Live uses configurable reasoning levels (minimal, low, medium, high) instead of a fixed token budget. Minimal is the default setting to ensure the lowest latency in voice applications.

Can it see my screen in real time?

Yes, the model can ingest continuous video frames through the Live API. This allows it to analyze screen content or camera feeds while speaking with the user.

Is there a free tier available?

Yes, Google AI Studio offers free access to the Gemini 3.1 Flash Live Preview for testing and development. Free tier data may be used to improve Google products.

Which languages are supported?

The model supports over 70 languages for text and audio. This broad language coverage supports global real-time translation and localized customer service.

Gemini 3.1 Flash Live Preview

Gemini 3.1 Flash Live Preview is Google's ultra-low-latency, audio-to-audio model featuring a 131K context window, high-fidelity multimodal reasoning, and...

MultimodalAudio-to-AudioLow LatencyVoice AIReal-Time

googleGeminiMarch 26, 2026

Context

131Ktokens

Max Output

66Ktokens

Input Price

$0.75/ 1M

Output Price

$4.50/ 1M

Modality:TextImageAudioVideo

Capabilities:VisionToolsStreamingReasoning

Benchmarks

GPQA

94%

HLE

44%

MMLU

91%

MMLU Pro

89%

SimpleQA

80%

IFEval

88%

AIME 2025

95%

MATH

100%

GSM8k

99%

MGSM

92%

MathVista

72%

SWE-Bench

81%

HumanEval

73%

LiveCodeBench

80%

MMMU

69%

MMMU Pro

60%

ChartQA

90%

DocVQA

94%

Terminal-Bench

69%

ARC-AGI

77%

View API Documentation

About Gemini 3.1 Flash Live Preview

Learn about Gemini 3.1 Flash Live Preview's capabilities, features, and how it can help you achieve better results.

Gemini 3.1 Flash Live Preview is a low-latency, multimodal model designed for real-time, audio-to-audio dialogue. It operates on Google's Gemini 3 architecture. A Sparse Mixture-of-Experts (MoE) design maintains high performance while reducing inference costs. Traditional models perform speech-to-text followed by text-to-speech. This model processes audio streams natively. It detects acoustic nuances such as tone, emotion, and background noise for natural interactions. Learn more in the official documentation.

Developers use this model for voice-first applications requiring numeric precision and immediate feedback. It supports configurable thinking levels ranging from minimal to high. This allows users to balance reasoning depth against latency requirements. With a 131,072-token context window and support for text, images, and video, it acts as a versatile engine. Target use cases include real-time agents, automated customer support, and collaborative coding environments.

Interrupt handling and noise filtering make it suited for real-world deployments. The model ignores siren and crowd noise while maintaining conversation flow. Developers access it through the Live API, building mobile and kiosk applications without separate transcription services.

Use Cases

Discover the different ways you can use Gemini 3.1 Flash Live Preview to achieve great results.

Real-Time Voice Agents

Builds conversational AI that responds instantly to user speech for hospitality, travel, and logistics support.

Live Multimodal Coaching

Provides immediate fitness or technical training by analyzing a user's camera feed and audio simultaneously.

Collaborative Coding Assistants

Directs an IDE to refactor code and update UI components through continuous voice instructions and screen sharing.

Low-Latency Translation

Facilitates cross-lingual conversations by translating speech-to-speech with preserved emotional context.

Noisy Environment Support

Powers customer service kiosks in high-traffic urban areas where the system must filter out siren and crowd noise.

Interactive NPC Gaming

Drives non-player characters that respond with natural vocal inflection and react to a player's physical movements.

Strengths

Limitations

Native Audio Processing: Operates strictly speech-to-speech, detecting verbal nuances like frustration or sarcasm that text-based models miss.

Synchronous Tool Use: Function calling operates sequentially, meaning the model stops speaking entirely while waiting for tool responses.

High Speed Performance: Features a 2.5x faster Time to First Token (TTFT) compared to its predecessors.

Lower Zero-Shot Logic: Raw reasoning scores sit below the Gemini 3.1 Pro flagship for complex PhD-level tasks.

Robust Noise Filtering: Maintains 95.9% accuracy on Big Bench Audio even in noisy environments like restaurants or busy roads.

Pricing Complexity: Multiple rate tiers for text, audio, and video make budgeting for multimodal applications difficult to predict.

Configurable Reasoning: Allows developers to dial the 'thinkingLevel' up or down to find the optimal balance between logic and speed.

Preview Status: Currently in preview, which subjects developers to rate limit fluctuations and unannounced behavioral tuning.

API Quick Start

google/gemini-3.1-flash-live-preview

View Documentation

google SDK

import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY });
const model = genAI.getGenerativeModel({
  model: "gemini-3.1-flash-live-preview",
  generationConfig: { thinkingLevel: "minimal" }
});

async function run() {
  const result = await model.generateContent("Analyze this audio stream.");
  console.log(result.response.text());
}
run();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Gemini 3.1 Flash Live Preview

“Gemini 3.1 Flash-Lite is rolling out... fastest and most cost-efficient Gemini 3 series model yet.”

— BuildwithVignesh

“Matches 2.5 Flash quality at Flash-Lite cost. Low-latency, audio-to-audio model optimized for real-time dialogue.”

— Google AI

twitter

“3 Flash degrades a lot as context increases, but it is a massive improvement for real-time responsiveness.”

— Pasto_Shouwa

“Google is really squeezing the margins on input tokens with 3.1 Flash. It's becoming hard to justify using anything else for simple agents.”

— AI_Dev_Master

hackernews

“The raw speech-to-speech architecture completely eliminates the awkward pauses you get with chained transcription models.”

— AIExplorer

youtube

“Testing the new Gemini 3.1 Flash Live Preview. The configurable thinking levels are incredibly useful for balancing speed vs reasoning.”

— DevGuru_X

twitter

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents

Web Automation

Smart Workflows

Get Started Free

Pro Tips

Expert tips to help you get the most out of Gemini 3.1 Flash Live Preview and achieve better results.

Adjust Thinking Levels

Set the 'thinkingLevel' to 'minimal' for the fastest voice responses or 'high' for complex multi-step logical tasks.

Use Incremental Updates

Send text updates via 'send_realtime_input' during active audio sessions to provide the model with changing context.

Optimize Turn Coverage

Set turn coverage to 'TURN_INCLUDES_AUDIO_ACTIVITY_AND_ALL_VIDEO' for comprehensive multimodal understanding.

Seed Initial Context

Use 'send_client_content' to establish a conversation's history before starting a Live API session for better continuity.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Mohammed Ibrahim

CEO, qannas.pro

Ben Bressington

CTO, AiChatSolutions

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

Claude Opus 4.7

Anthropic

Claude Opus 4.7 is Anthropic's flagship model with a 1-million-token context, adaptive reasoning, and 3.3x vision resolution for enterprise-scale agents.

1M context

$5.00/$25.00/1M

Gemini 3.1 Pro

Google

Gemini 3.1 Pro is Google's elite multimodal model featuring the DeepThink reasoning engine, a 1M+ context window, and industry-leading ARC-AGI logic scores.

1M context

$2.00/$12.00/1M

GPT-5.5

OpenAI

GPT-5.5 is OpenAI's flagship frontier model with a 1M context window and five reasoning effort levels, optimized for autonomous agentic workflows and coding.

1M context

$5.00/$30.00/1M

Grok-3

xAI

Grok-3 is xAI's flagship reasoning model, featuring deep logic deduction, a 128k context window, and real-time integration with X for live research and coding.

1M context

$3.00/$15.00/1M

Kimi K3

Moonshot

Kimi K3 is Moonshot AI's 2.8T MoE model with a 1M token context window, native multimodal vision, and frontier-tier coding performance for complex agents.

1M context

$3.00/$15.00/1M

GPT-5.2 Pro

OpenAI

GPT-5.2 Pro is OpenAI's 2025 flagship reasoning model featuring Extended Thinking for SOTA performance in mathematics, coding, and expert knowledge work.

400K context

$21.00/$168.00/1M

Qwen 3.7 Max

alibaba

Qwen 3.7 Max is Alibaba’s flagship AI model for deep reasoning and autonomous agent tasks, featuring a 256k context window and top-tier coding performance.

256K context

$1.20/$6.00/1M

Gemini 3 Pro

Google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context

$2.00/$12.00/1M

Frequently Asked Questions

Find answers to common questions about Gemini 3.1 Flash Live Preview

Gemini 3.1 Flash Live Preview

About Gemini 3.1 Flash Live Preview

Use Cases

Real-Time Voice Agents

Live Multimodal Coaching

Collaborative Coding Assistants

Low-Latency Translation

Noisy Environment Support

Interactive NPC Gaming

Strengths

Limitations

API Quick Start

Community Feedback

Related Videos

Supercharge your workflow with AI Automation

Pro Tips

Adjust Thinking Levels

Use Incremental Updates

Optimize Turn Coverage

Seed Initial Context

What Our Users Say

Related AI Models

Claude Opus 4.7

Gemini 3.1 Pro

GPT-5.5

Grok-3

Kimi K3

GPT-5.2 Pro

Qwen 3.7 Max

Gemini 3 Pro

Frequently Asked Questions

What is the context window for Gemini 3.1 Flash Live?

How much does the API cost?

Does this model support function calling?

How does thinking work in this model?

Can it see my screen in real time?

Is there a free tier available?

Which languages are supported?