alibaba

Qwen3.5-Omni

Qwen3.5-Omni is a natively omnimodal AI by Alibaba Cloud, offering seamless audio-visual reasoning, real-time voice chat, and 256k context for low-latency apps.

OmnimodalReal-time VoiceVideo VisionAlibaba CloudMoE
alibaba logoalibabaQwen3.5March 29, 2026
Context
256Ktokens
Max Output
8Ktokens
Input Price
$0.40/ 1M
Output Price
$4.80/ 1M
Modality:TextImageAudioVideo
Capabilities:VisionToolsStreaming
Benchmarks
GPQA
83.9%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Qwen3.5-Omni scored 83.9% on this benchmark.
HLE
34.2%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Qwen3.5-Omni scored 34.2% on this benchmark.
MMLU
94.2%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Qwen3.5-Omni scored 94.2% on this benchmark.
MMLU Pro
85.9%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Qwen3.5-Omni scored 85.9% on this benchmark.
SimpleQA
48.2%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Qwen3.5-Omni scored 48.2% on this benchmark.
IFEval
89.7%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Qwen3.5-Omni scored 89.7% on this benchmark.
AIME 2025
81.6%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Qwen3.5-Omni scored 81.6% on this benchmark.
MATH
90.4%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Qwen3.5-Omni scored 90.4% on this benchmark.
GSM8k
94.5%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Qwen3.5-Omni scored 94.5% on this benchmark.
MGSM
94.1%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Qwen3.5-Omni scored 94.1% on this benchmark.
MathVista
86.1%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Qwen3.5-Omni scored 86.1% on this benchmark.
SWE-Bench
75%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Qwen3.5-Omni scored 75% on this benchmark.
HumanEval
91.2%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Qwen3.5-Omni scored 91.2% on this benchmark.
LiveCodeBench
65.6%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Qwen3.5-Omni scored 65.6% on this benchmark.
MMMU
80.1%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Qwen3.5-Omni scored 80.1% on this benchmark.
MMMU Pro
73.9%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Qwen3.5-Omni scored 73.9% on this benchmark.
ChartQA
85.3%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Qwen3.5-Omni scored 85.3% on this benchmark.
DocVQA
95.2%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Qwen3.5-Omni scored 95.2% on this benchmark.
Terminal-Bench
52.5%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Qwen3.5-Omni scored 52.5% on this benchmark.
ARC-AGI
12.5%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Qwen3.5-Omni scored 12.5% on this benchmark.

About Qwen3.5-Omni

Learn about Qwen3.5-Omni's capabilities, features, and how it can help you achieve better results.

Unified Omnimodal Architecture

Qwen3.5-Omni is a natively omnimodal model developed by Alibaba Cloud, built on a unified architecture designed to process text, image, audio, and video inputs simultaneously. Unlike previous models that relied on separate encoders, Qwen3.5-Omni utilizes a Thinker-Talker architecture. The Thinker component performs complex multimodal reasoning across interleaved signals, while the Talker component generates high-quality, low-latency streaming speech. This allows the model to handle massive context, including up to 10 hours of audio or nearly seven minutes of 720p video in a single prompt.

Advanced Synchronization and Performance

A technical feature of this model is the Adaptive Rate Interleave Alignment (ARIA) system, which synchronizes text and speech tokens to ensure natural-sounding voice responses. The model supports real-time semantic interruption, allowing users to cut off the AI during conversation. It is optimized for both enterprise-grade multimodal analysis and consumer-facing real-time voice assistants, offering performance in vision and audio tasks that matches or exceeds proprietary flagship models.

Specialized for Low-Latency Interaction

The model's architecture is specifically tuned for real-time applications where latency is critical. By using a Mixture-of-Experts (MoE) approach with a gated delta networks architecture, the model maintains high computational efficiency. This efficiency enables it to provide real-time audio interaction while managing a 256k token context window, making it suitable for long-form content analysis such as meeting transcripts and cinematic video indexing.

Qwen3.5-Omni

Use Cases

Discover the different ways you can use Qwen3.5-Omni to achieve great results.

Real-time Voice Assistants

The model builds interactive AI avatars that engage in natural voice conversations with semantic interruption support.

Cinematic Video Captioning

It generates screenplay-level descriptions and timestamped annotations for high-definition long-form video content.

Audio-Visual Live Coding

Developers fix code by showing their screen and verbally explaining logic in real-time to the model.

Enterprise Audio Archiving

The system processes up to 10 hours of meeting recordings or podcasts to extract insights in one pass.

Multilingual Translation Services

It provides end-to-end speech-to-speech translation across 113 languages and various regional Chinese dialects.

Content Moderation

The model audits video and audio streams for safety by identifying visual and verbal prohibited content simultaneously.

Strengths

Limitations

Native Omnimodal Fusion: It integrates text, vision, and audio in one model, achieving SOTA results across 215 multimodal subtasks.
High GPU Requirement: Local deployment of the omnimodal MoE architecture requires significant VRAM compared to text-only models.
Vast Audio Horizon: The 256k context window allows for processing over 10 hours of continuous audio data in a single request.
Regional API Latency: Real-time performance is currently optimized for users close to Alibaba Cloud's primary regional clusters in Asia.
Low-Latency Real-time Voice: The Thinker-Talker architecture ensures sub-second response times for interactive, interruptible voice conversations.
Text Reasoning Gap: While excellent at multimodal tasks, its pure logic performance (GPQA 83.9) lags behind specialized reasoning models.
Aggressive Efficiency Pricing: At $0.40/1M input tokens, it provides flagship-level multimodal capabilities at a low cost compared to competitors.
Experimental Visual Coding: The vibe coding feature is an emergent capability and can struggle with complex spatial UI coordinates in video.

API Quick Start

alibaba/qwen3.5-omni-plus

View Documentation
alibaba SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1',
});

const completion = await client.chat.completions.create({
  model: 'qwen3.5-omni-plus',
  messages: [{ role: 'user', content: 'Analyze this video content.' }],
  modalities: ['text'],
  stream: true,
});

for await (const chunk of completion) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Qwen3.5-Omni

The Audio-Visual Vibe Coding is a game changer; it finally understands what I am showing on screen while I explain the bug.
dev_mindset
reddit
Qwen3.5-Omni's ability to handle 10 hours of audio in one context is insane for researchers and podcasters.
AI_Explorer_01
twitter
The voice cloning sounds surprisingly natural compared to the previous generation, almost indistinguishable in English.
TechGuru_Reviews
youtube
Finally, a model that doesn't just cut me off mid-sentence; the semantic interruption works as advertised.
hacker_news_user
hackernews
Impressive numbers on the new Qwen3.6 27B, but the Omni version is the one everyone will use for real products.
David Hendrickson
twitter
I tried interrupting it five times, and it caught my intent every single time.
Matt Shumer
youtube

Related Videos

Watch tutorials, reviews, and discussions about Qwen3.5-Omni

The Thinker-Talker architecture is a massive leap forward for real-time latency [04:15].

It handles 400 seconds of video which is double what we usually see [07:22].

This model is natively end-to-end multilingual and multimodal [10:05].

The ARIA system prevents the pronunciation errors found in standard TTS [15:30].

You can literally show your screen and have a fluid conversation about the code [22:10].

I tried interrupting it five times, and it caught my intent every single time [08:30].

The way it writes code based on what it sees in the video is spooky [10:45].

This is the first real competitor to GPT-4o's voice mode we have seen [14:20].

It supports 113 languages for speech recognition, which is a huge advantage [18:55].

The vision extraction is far more robust for complex PDFs and video [25:15].

The 10-hour audio context is the real star here for enterprise use [12:10].

Performance in non-English languages is where Qwen really pulls ahead [15:40].

It can distinguish between background noise and actual user interruption [19:22].

Pricing is very competitive, especially for the scale of parameters active [24:10].

This is currently the most capable model for Python automation involving visual UI [28:45].

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Qwen3.5-Omni and achieve better results.

Optimize Audio Ingestion

Segment audio longer than 10 hours to maintain factual retrieval accuracy within the 256k context window.

Leverage Semantic Interruption

Enable native turn-taking features in voice apps to distinguish user intent from background noise.

Use ARIA for Technical Terms

Utilize streaming speech mode to benefit from ARIA alignment, which ensures technical numbers are pronounced accurately.

Video Frame Rate Control

Upload standard video at 1 FPS, but increase the rate for high-action scenes to ensure visual precision.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

openai

GPT-5.4

OpenAI

GPT-5.4 is OpenAI's frontier model featuring a 1.05M context window and Extreme Reasoning. It excels at autonomous UI interaction and long-form data analysis.

1M context
$2.50/$15.00/1M
moonshot

Kimi K2 Thinking

Moonshot

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

256K context
$0.60/$2.50/1M
openai

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context
$1.75/$14.00/1M
alibaba

Qwen3.6-Max-Preview

alibaba

Qwen3.6-Max-Preview is Alibaba's flagship MoE model featuring 1M context, a native thinking mode, and SOTA scores in agentic coding and reasoning.

1M context
$1.25/$10.00/1M
zhipu

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context
$1.00/$3.20/1M
zhipu

GLM-5.1

Zhipu (GLM)

GLM-5.1 is Zhipu AI's flagship reasoning model, featuring a 202K context window and an autonomous 8-hour execution loop for complex agentic engineering.

203K context
$1.40/$4.40/1M
openai

GPT-5.3 Codex

OpenAI

GPT-5.3 Codex is OpenAI's 2026 frontier coding agent, featuring a 400K context window, 77.3% Terminal-Bench score, and superior logic for complex software...

400K context
$1.75/$14.00/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M

Frequently Asked Questions

Find answers to common questions about Qwen3.5-Omni