alibaba

Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

MultimodalMoEOpen-WeightsAgentic AIReasoning
alibaba logoalibabaQwen3.5February 16, 2026
Context
1.0Mtokens
Max Output
66Ktokens
Input Price
$0.40/ 1M
Output Price
$2.40/ 1M
Modality:TextImageVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
88.4%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Qwen3.5-397B-A17B scored 88.4% on this benchmark.
HLE
28.7%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Qwen3.5-397B-A17B scored 28.7% on this benchmark.
MMLU
88.4%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Qwen3.5-397B-A17B scored 88.4% on this benchmark.
MMLU Pro
87.8%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Qwen3.5-397B-A17B scored 87.8% on this benchmark.
SimpleQA
41.3%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Qwen3.5-397B-A17B scored 41.3% on this benchmark.
IFEval
92.6%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Qwen3.5-397B-A17B scored 92.6% on this benchmark.
AIME 2025
91.3%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Qwen3.5-397B-A17B scored 91.3% on this benchmark.
MATH
74.1%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Qwen3.5-397B-A17B scored 74.1% on this benchmark.
GSM8k
93.7%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Qwen3.5-397B-A17B scored 93.7% on this benchmark.
MGSM
92%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Qwen3.5-397B-A17B scored 92% on this benchmark.
MathVista
90.3%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Qwen3.5-397B-A17B scored 90.3% on this benchmark.
SWE-Bench
76.4%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Qwen3.5-397B-A17B scored 76.4% on this benchmark.
HumanEval
79.3%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Qwen3.5-397B-A17B scored 79.3% on this benchmark.
LiveCodeBench
83.6%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Qwen3.5-397B-A17B scored 83.6% on this benchmark.
MMMU
85%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Qwen3.5-397B-A17B scored 85% on this benchmark.
MMMU Pro
79%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Qwen3.5-397B-A17B scored 79% on this benchmark.
ChartQA
88%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Qwen3.5-397B-A17B scored 88% on this benchmark.
DocVQA
90.8%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Qwen3.5-397B-A17B scored 90.8% on this benchmark.
Terminal-Bench
52.5%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Qwen3.5-397B-A17B scored 52.5% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Qwen3.5-397B-A17B scored 12% on this benchmark.

About Qwen3.5-397B-A17B

Learn about Qwen3.5-397B-A17B's capabilities, features, and how it can help you achieve better results.

High-Efficiency Mixture of Experts

Qwen3.5-397B-A17B is a flagship native multimodal model that utilizes an innovative hybrid architecture fusing linear attention through Gated Delta Networks with a sparse Mixture-of-Experts (MoE). Although it contains 397 billion total parameters, its sparse design activates only 17 billion parameters per forward pass, achieving exceptional inference efficiency and speed without compromising on its vast reasoning capabilities. It is optimized for both language and visual tasks, supporting a massive vocabulary of 250k tokens and providing support for over 201 languages and dialects.

Native Multimodal Agentic Workflows

The model excels as a native multimodal agent, capable of processing up to one million tokens of context, which is equivalent to approximately two hours of video. It introduces a specialized Thinking Mode for complex logical reasoning and is natively equipped for agentic workflows, including web development, GUI navigation, and real-world spatial intelligence. Its architecture supports FP8 end-to-end training and a disaggregated training-inference framework, making it one of the most scalable and efficient models for enterprise-grade AI applications.

Open Weights for Global Accessibility

Released under the Apache 2.0 license, this model provides the open-source community with frontier-level capabilities previously restricted to proprietary systems. It bridges the gap between massive parameter counts and practical deployment, allowing organizations to run state-of-the-art reasoning tasks on private infrastructure with significantly lower compute overhead than dense 400B alternatives.

Qwen3.5-397B-A17B

Use Cases

Discover the different ways you can use Qwen3.5-397B-A17B to achieve great results.

Long-Horizon Video Analysis

Analyze up to two hours of video content to extract logic, reverse-engineer code from footage, or generate structured summaries.

PhD-Level STEM Research

Solve graduate-level PhD science questions and olympiad-level math problems using its adaptive deep-thinking mode.

Autonomous GUI Agents

Automate interactions with smartphones and computers to handle office workflows and cross-app mobile navigation.

Visual Software Engineering

Execute 'vibe coding' by turning natural language instructions and UI sketches into functional frontend code.

Document Intelligence

Process complex documents, charts, and handwritten sketches to extract structured data and reverse-engineer layouts.

Spatial AI Applications

Understand pixel-level relationships for embodied AI tasks like autonomous driving scene analysis and robotic navigation.

Strengths

Limitations

Superior Video Support: Supports 1 million tokens, allowing native processing of up to 120 minutes of video for agentic and coding tasks.
Massive Hardware Barrier: Full deployment requires server-grade GPU racks with over 800GB of VRAM for uncompressed 16-bit precision.
MoE Inference Efficiency: The 397B total/17B active architecture provides a 19x decoding throughput boost compared to previous dense flagship models.
HLE Knowledge Gap: Despite its high scores in science and math, it scores only 28.7% on Humanity's Last Exam (HLE), indicating a gap in absolute expert factuality.
State-of-the-Art Reasoning: Achieves 91.3% on AIME and 88.4% on GPQA, rivaling top closed-source models in PhD-level science and math.
Tool Overconfidence: In autonomous agent scenarios, the model occasionally hallucinated tool outputs or ignores results in favor of internal predictions.
Apache 2.0 Open Weights: Offers frontier-level intelligence with the freedom of open weights, allowing for private, on-premise deployment.
Terminal Task Performance: Scores 52.5% on Terminal-Bench 2.0, trailing behind competitors in complex command-line interaction tasks.

API Quick Start

alibaba/qwen3.5-plus

View Documentation
alibaba SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1',
});

async function main() {
  const completion = await client.chat.completions.create({
    model: 'qwen3.5-plus',
    messages: [{ role: 'user', content: 'Analyze the logic of this MoE architecture.' }],
    extra_body: { enable_thinking: true },
  });
  console.log(completion.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Qwen3.5-397B-A17B

Qwen3.5-397B is essentially a GPT-5 class model but open-weight. The DeltaNet architecture is solving the MoE latency issues perfectly.
u/DeepLearningLover
reddit
Native multimodal reasoning on Qwen3.5 looks incredible. 1M context + video analysis is going to change agent workflows.
@AiDevDaily
twitter
The decision to use FP8 end-to-end training while maintaining BF16 in sensitive layers is a masterclass in stability optimizations.
cold_fusion
hackernews
This is the first time I've seen an open model actually beat Gemini 1.5 Pro on complex multimodal agent tasks.
AI Revolution
youtube
The 19x decoding throughput improvement over Qwen3-Max makes this a viable alternative for production-level agents.
u/ModelTester2026
reddit
I was surprised by how well it handles 4-bit quantization. It keeps almost all the reasoning capability on a dual A100 setup.
@GlobalTechReview
twitter

Related Videos

Watch tutorials, reviews, and discussions about Qwen3.5-397B-A17B

A 397 billion parameter model, but with 17 billion parameters active.

When decoding at 256K, this model is 19 times faster than Qwen 3 Max.

The native vision-language reasoning is what sets this apart for agentic workflows.

This beats most closed models on the standard math benchmarks.

Running this locally is tough, but the quantized versions are workable on high-end macs.

397 billion parameter model with 17 billion active parameters. It's natively multimodal.

It's probably currently the best open-source multimodal model.

The ability to process two hours of video natively is a massive advantage.

Look at these logic scores, it's hitting GPT-4o levels consistently.

The Apache license makes this very attractive for corporate data privacy.

OCR structured extraction. You have a messy PDF... and you need to turn that into clean JSON. This model excels there.

You get the intelligence of a 400 billion parameter giant... but you pay the compute cost of a 17 billion parameter model.

It handles long-context retrieval better than the previous version.

The tool use integration is built into the base training, not an afterthought.

Thinking mode allows it to correct its own logic before outputting.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Qwen3.5-397B-A17B and achieve better results.

Toggle Thinking Mode

Pass the 'enable_thinking: true' parameter in your API call to activate deep reasoning for math, coding, and complex logic puzzles.

Utilize Fast Mode

Use 'Fast' mode for simple queries to get instant answers without consuming tokens on unnecessary internal thinking phases.

Optimize Video Prompts

When analyzing video, prompt the model to focus on the final dynamic outcome rather than frame-by-frame analysis for better temporal coherence.

Leverage Quantization

Use 4-bit or 8-bit quantization (GGUF/EXL2) to run the model on consumer-grade hardware if you have sufficient VRAM (200GB+).

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

256K context
$0.60/$3.00/1M
xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M
anthropic

Claude Sonnet 4.6

Anthropic

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

1M context
$3.00/$15.00/1M
google

Gemini 3 Flash

Google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
zhipu

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context
$1.00/$3.20/1M

Frequently Asked Questions

Find answers to common questions about Qwen3.5-397B-A17B