google

Gemini 3.1 Flash Live Preview

Gemini 3.1 Flash Live Preview is Google's ultra-low-latency, audio-to-audio model featuring a 131K context window, high-fidelity multimodal reasoning, and...

MultimodalAudio-to-AudioLow LatencyVoice AIReal-Time
google logogoogleGeminiMarch 26, 2026
Context
131Ktokens
Max Output
66Ktokens
Input Price
$0.75/ 1M
Output Price
$4.50/ 1M
Modality:TextImageAudioVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
94%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Gemini 3.1 Flash Live Preview scored 94% on this benchmark.
HLE
44%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Gemini 3.1 Flash Live Preview scored 44% on this benchmark.
MMLU
91%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Gemini 3.1 Flash Live Preview scored 91% on this benchmark.
MMLU Pro
89%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Gemini 3.1 Flash Live Preview scored 89% on this benchmark.
SimpleQA
80%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Gemini 3.1 Flash Live Preview scored 80% on this benchmark.
IFEval
88%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Gemini 3.1 Flash Live Preview scored 88% on this benchmark.
AIME 2025
95%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Gemini 3.1 Flash Live Preview scored 95% on this benchmark.
MATH
100%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Gemini 3.1 Flash Live Preview scored 100% on this benchmark.
GSM8k
99%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Gemini 3.1 Flash Live Preview scored 99% on this benchmark.
MGSM
92%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Gemini 3.1 Flash Live Preview scored 92% on this benchmark.
MathVista
72%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Gemini 3.1 Flash Live Preview scored 72% on this benchmark.
SWE-Bench
81%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Gemini 3.1 Flash Live Preview scored 81% on this benchmark.
HumanEval
73%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Gemini 3.1 Flash Live Preview scored 73% on this benchmark.
LiveCodeBench
80%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Gemini 3.1 Flash Live Preview scored 80% on this benchmark.
MMMU
69%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Gemini 3.1 Flash Live Preview scored 69% on this benchmark.
MMMU Pro
60%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Gemini 3.1 Flash Live Preview scored 60% on this benchmark.
ChartQA
90%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Gemini 3.1 Flash Live Preview scored 90% on this benchmark.
DocVQA
94%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Gemini 3.1 Flash Live Preview scored 94% on this benchmark.
Terminal-Bench
69%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Gemini 3.1 Flash Live Preview scored 69% on this benchmark.
ARC-AGI
77%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Gemini 3.1 Flash Live Preview scored 77% on this benchmark.

About Gemini 3.1 Flash Live Preview

Learn about Gemini 3.1 Flash Live Preview's capabilities, features, and how it can help you achieve better results.

Gemini 3.1 Flash Live Preview is a low-latency, multimodal model designed for real-time, audio-to-audio dialogue. It operates on Google's Gemini 3 architecture. A Sparse Mixture-of-Experts (MoE) design maintains high performance while reducing inference costs. Traditional models perform speech-to-text followed by text-to-speech. This model processes audio streams natively. It detects acoustic nuances such as tone, emotion, and background noise for natural interactions. Learn more in the official documentation.

Developers use this model for voice-first applications requiring numeric precision and immediate feedback. It supports configurable thinking levels ranging from minimal to high. This allows users to balance reasoning depth against latency requirements. With a 131,072-token context window and support for text, images, and video, it acts as a versatile engine. Target use cases include real-time agents, automated customer support, and collaborative coding environments.

Interrupt handling and noise filtering make it suited for real-world deployments. The model ignores siren and crowd noise while maintaining conversation flow. Developers access it through the Live API, building mobile and kiosk applications without separate transcription services.

Gemini 3.1 Flash Live Preview

Use Cases

Discover the different ways you can use Gemini 3.1 Flash Live Preview to achieve great results.

Real-Time Voice Agents

Builds conversational AI that responds instantly to user speech for hospitality, travel, and logistics support.

Live Multimodal Coaching

Provides immediate fitness or technical training by analyzing a user's camera feed and audio simultaneously.

Collaborative Coding Assistants

Directs an IDE to refactor code and update UI components through continuous voice instructions and screen sharing.

Low-Latency Translation

Facilitates cross-lingual conversations by translating speech-to-speech with preserved emotional context.

Noisy Environment Support

Powers customer service kiosks in high-traffic urban areas where the system must filter out siren and crowd noise.

Interactive NPC Gaming

Drives non-player characters that respond with natural vocal inflection and react to a player's physical movements.

Strengths

Limitations

Native Audio Processing: Operates strictly speech-to-speech, detecting verbal nuances like frustration or sarcasm that text-based models miss.
Synchronous Tool Use: Function calling operates sequentially, meaning the model stops speaking entirely while waiting for tool responses.
High Speed Performance: Features a 2.5x faster Time to First Token (TTFT) compared to its predecessors.
Lower Zero-Shot Logic: Raw reasoning scores sit below the Gemini 3.1 Pro flagship for complex PhD-level tasks.
Robust Noise Filtering: Maintains 95.9% accuracy on Big Bench Audio even in noisy environments like restaurants or busy roads.
Pricing Complexity: Multiple rate tiers for text, audio, and video make budgeting for multimodal applications difficult to predict.
Configurable Reasoning: Allows developers to dial the 'thinkingLevel' up or down to find the optimal balance between logic and speed.
Preview Status: Currently in preview, which subjects developers to rate limit fluctuations and unannounced behavioral tuning.

API Quick Start

google/gemini-3.1-flash-live-preview

View Documentation
google SDK
import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY });
const model = genAI.getGenerativeModel({
  model: "gemini-3.1-flash-live-preview",
  generationConfig: { thinkingLevel: "minimal" }
});

async function run() {
  const result = await model.generateContent("Analyze this audio stream.");
  console.log(result.response.text());
}
run();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Gemini 3.1 Flash Live Preview

Gemini 3.1 Flash-Lite is rolling out... fastest and most cost-efficient Gemini 3 series model yet.
BuildwithVignesh
reddit
Matches 2.5 Flash quality at Flash-Lite cost. Low-latency, audio-to-audio model optimized for real-time dialogue.
Google AI
twitter
3 Flash degrades a lot as context increases, but it is a massive improvement for real-time responsiveness.
Pasto_Shouwa
reddit
Google is really squeezing the margins on input tokens with 3.1 Flash. It's becoming hard to justify using anything else for simple agents.
AI_Dev_Master
hackernews
The raw speech-to-speech architecture completely eliminates the awkward pauses you get with chained transcription models.
AIExplorer
youtube
Testing the new Gemini 3.1 Flash Live Preview. The configurable thinking levels are incredibly useful for balancing speed vs reasoning.
DevGuru_X
twitter

Related Videos

Watch tutorials, reviews, and discussions about Gemini 3.1 Flash Live Preview

You speak, it responds instantly. No lag, no loading, no weird pauses. It feels like talking to a real person.

It scores 95.9% on the Big Bench audio benchmark. That is best-in-class for audio reasoning.

You are not giving it instructions and waiting. You are co-building with it in real time.

The model can see your screen while you code and talk to you about the changes.

Pricing is split across text and audio, so you have to calculate your costs carefully.

This picks up on your tone, your pace, and your mood. It picks up on frustration or confusion.

Gemini 3.1 Flash Live scores number one in the world on the hardest AI voice benchmarks.

It actually understands complex topics. You can add reasoning to the level of AI you have.

You can interrupt it mid-sentence and it immediately stops and listens to the new instruction.

The 128K context window means it remembers the beginning of a 30-minute conversation.

It's no longer doing speech to text and then text to speech. It's just straight up speech to speech.

The agent being able to listen in noisy environments... like the side of the road or a noisy restaurant.

When I interrupted it, how fast it stopped talking... I think was really impressive.

You can combine this with local code agents to literally voice-command your software development.

The time to first token is roughly 2.5 times faster than the previous generation.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Gemini 3.1 Flash Live Preview and achieve better results.

Adjust Thinking Levels

Set the 'thinkingLevel' to 'minimal' for the fastest voice responses or 'high' for complex multi-step logical tasks.

Use Incremental Updates

Send text updates via 'send_realtime_input' during active audio sessions to provide the model with changing context.

Optimize Turn Coverage

Set turn coverage to 'TURN_INCLUDES_AUDIO_ACTIVITY_AND_ALL_VIDEO' for comprehensive multimodal understanding.

Seed Initial Context

Use 'send_client_content' to establish a conversation's history before starting a Live API session for better continuity.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

google

Gemini 3.1 Pro

Google

Gemini 3.1 Pro is Google's elite multimodal model featuring the DeepThink reasoning engine, a 1M+ context window, and industry-leading ARC-AGI logic scores.

1M context
$2.00/$12.00/1M
xai

Grok-3

xAI

Grok-3 is xAI's flagship reasoning model, featuring deep logic deduction, a 128k context window, and real-time integration with X for live research and coding.

1M context
$3.00/$15.00/1M
openai

GPT-5.2 Pro

OpenAI

GPT-5.2 Pro is OpenAI's 2025 flagship reasoning model featuring Extended Thinking for SOTA performance in mathematics, coding, and expert knowledge work.

400K context
$21.00/$168.00/1M
google

Gemini 3 Pro

Google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

1M context
$5.00/$25.00/1M
google

Gemini 3 Flash

Google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
anthropic

Claude Sonnet 4.6

Anthropic

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

1M context
$3.00/$15.00/1M
alibaba

Qwen3.5-397B-A17B

alibaba

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

1M context
$0.40/$2.40/1M

Frequently Asked Questions

Find answers to common questions about Gemini 3.1 Flash Live Preview