alibaba

Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

MultimodalMoEOpen-WeightsAgentic AIReasoning
alibaba logoalibabaQwen2026-02-16
Context
1.0Mtokens
Max Output
8Ktokens
Input Price
$0.60/ 1M
Output Price
$3.60/ 1M
Modality:TextImageVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
88.4%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Qwen3.5-397B-A17B scored 88.4% on this benchmark.
HLE
28.7%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Qwen3.5-397B-A17B scored 28.7% on this benchmark.
MMLU
88.6%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Qwen3.5-397B-A17B scored 88.6% on this benchmark.
MMLU Pro
87.8%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Qwen3.5-397B-A17B scored 87.8% on this benchmark.
SimpleQA
48%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Qwen3.5-397B-A17B scored 48% on this benchmark.
IFEval
92.6%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Qwen3.5-397B-A17B scored 92.6% on this benchmark.
AIME 2025
91.3%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Qwen3.5-397B-A17B scored 91.3% on this benchmark.
MATH
74.1%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Qwen3.5-397B-A17B scored 74.1% on this benchmark.
GSM8k
93.7%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Qwen3.5-397B-A17B scored 93.7% on this benchmark.
MGSM
92.1%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Qwen3.5-397B-A17B scored 92.1% on this benchmark.
MathVista
90.3%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Qwen3.5-397B-A17B scored 90.3% on this benchmark.
SWE-Bench
76.4%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Qwen3.5-397B-A17B scored 76.4% on this benchmark.
HumanEval
79.3%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Qwen3.5-397B-A17B scored 79.3% on this benchmark.
LiveCodeBench
83.6%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Qwen3.5-397B-A17B scored 83.6% on this benchmark.
MMMU
85%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Qwen3.5-397B-A17B scored 85% on this benchmark.
MMMU Pro
79%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Qwen3.5-397B-A17B scored 79% on this benchmark.
ChartQA
86.5%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Qwen3.5-397B-A17B scored 86.5% on this benchmark.
DocVQA
93.2%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Qwen3.5-397B-A17B scored 93.2% on this benchmark.
Terminal-Bench
52.5%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Qwen3.5-397B-A17B scored 52.5% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Qwen3.5-397B-A17B scored 12% on this benchmark.

About Qwen3.5-397B-A17B

Learn about Qwen3.5-397B-A17B's capabilities, features, and how it can help you achieve better results.

A Monumental Leap in Open AI

Qwen3.5-397B-A17B represents a monumental leap in Alibaba Cloud's AI strategy, transitioning from a strong open-source contender to a dominant frontier-level system designed for the agentic AI era. Released on February 16, 2026, it is the flagship of the Qwen3.5 series, utilizing a massive 397-billion-parameter Mixture-of-Experts (MoE) architecture. By activating only 17 billion parameters per token, it achieves an unprecedented 19x decoding throughput boost compared to its predecessor, Qwen3-Max, while narrowing the performance gap with the world's most advanced proprietary models.

Unified Multimodal Powerhouse

The model is a unified, native multimodal powerhouse. Unlike previous versions that required separate vision-language adapters, Qwen3.5 features early-fusion multimodality trained on trillions of multimodal tokens. This allows it to watch and reason over two hours of video content, operate as a GUI agent across desktop and mobile interfaces, and handle complex coding tasks in its specialized Thinking mode. With an expanded vocabulary of 250,000 tokens supporting 201 languages, it stands as the premier global choice for multilingual and multimodal automation.

Architected for the Agentic Era

Beyond simple chat, Qwen3.5-397B is optimized for tool use and autonomous workflows. Its high scores in function-calling benchmarks and instruction following make it an ideal backbone for visual software engineering and PhD-level research. By offering state-of-the-art performance under an Apache 2.0 license, Alibaba has provided the community with a credible, high-efficiency alternative to the most restricted closed-source models.

Qwen3.5-397B-A17B

Use Cases for Qwen3.5-397B-A17B

Discover the different ways you can use Qwen3.5-397B-A17B to achieve great results.

Autonomous GUI Agents

Navigates complex PC and smartphone interfaces to complete multi-step office automation workflows.

Long-Form Video Intelligence

Extracts deep causal reasoning and summaries from continuous video files up to 120 minutes long.

Vibe Coding & Prototyping

Translates UI sketches directly into production-ready React and frontend logic in a single shot.

PhD-Level Research

Solves graduate-level STEM problems using specialized internal chain-of-thought Thinking mode.

Multilingual Global Support

Engages users across 201 languages with superior tokenization efficiency for non-English scripts.

Visual Software Engineering

Transforms wireframes and screenshots into clean, layout-aware HTML, CSS, and JavaScript code.

Strengths

Limitations

Inference Efficiency: Achieves 19x decoding throughput gains by activating only 17B parameters via its hybrid MoE architecture.
Massive Hardware Demand: At 397B total parameters, running unquantized versions locally requires high-end server-grade infrastructure.
Native Video Reasoning: Processes up to 120 minutes of continuous video natively without the need for frame-extraction adapters.
Audio Modality Gap: Lacks the native audio input and output capabilities found in 'omni' models like GPT-4o or Gemini.
Top-Tier STEM Capability: Rivals proprietary reasoning models with an 88.4% score on GPQA and 91.3% on AIME 2025 math exams.
HLE Performance Gap: Trails proprietary leaders on Humanity's Last Exam (28.7%), indicating gaps in niche expert knowledge.
Open-Weight Accessibility: Provides frontier-level multimodal intelligence under the Apache 2.0 license for private enterprise deployment.
Memory Footprint: The sheer scale requires substantial VRAM even with sparsity, limiting widespread consumer-level deployment.

API Quick Start

alibaba/qwen-3.5-plus

View Documentation
alibaba SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1',
});

async function main() {
  const completion = await client.chat.completions.create({
    model: 'qwen-3.5-397b-instruct',
    messages: [{ role: 'user', content: 'Analyze this 2-hour video context.' }],
    extra_body: { enable_thinking: true },
  });
  console.log(completion.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

What People Are Saying About Qwen3.5-397B-A17B

See what the community thinks about Qwen3.5-397B-A17B

Qwen3.5-397B is basically the open-source community's answer to GPT-4o. The SVG capability alone is insane for web design.
u/LLM_Reviewer
reddit
The 19x throughput boost makes Qwen3.5 feel significantly more responsive than any other model of this size I have tested.
tech_enthusiast_99
reddit
Apache 2.0 for a model this large is a total game changer for local AI development and privacy-focused enterprises.
TechInnovator88
twitter
The MoE routing in the 3.5-397B model is noticeably more intelligent than the previous 2.5 generation; it actually follows logic.
DistanceSolar1449
reddit
The 1M context on an open-weight model of this caliber is unprecedented in the current ecosystem.
dev_logic
hackernews
The video reasoning isn't just frame-by-frame; it's actual temporal understanding that feels miles ahead of current vision LLMs.
Matthew Berman (Context)
youtube

Videos About Qwen3.5-397B-A17B

Watch tutorials, reviews, and discussions about Qwen3.5-397B-A17B

It beats Claude Opus 4.5 on browser comp as well as Gemini 3 Pro in several multimodal tasks.

Reportedly 19 times faster than the Qwen 3 Max which supports 201 languages and dialects.

It did a pretty great job with the photorealistic butterfly... better than most open-source models.

The 397B model is essentially the first open-weights model to truly compete at the frontier of AGI.

Scaling with MoE is clearly working for Alibaba and their latest benchmark results prove it.

This model is matching what their Qwen Max was able to do... but it's able to do it with up to a 19x speed boost.

The tokenizer has actually gone to a vocab of 250K... matching Gemini and the Google tokenizer.

You have to think of the Qwen team as a Frontier Lab... they're jumping into tasks proprietary labs focus on.

The tokenization is much more efficient for non-Latin scripts compared to earlier Llama iterations.

Thinking mode adds significant latency but the accuracy gain is worth it for coding and reasoning.

This is a unified vision language model... where prior models had a specific VL variant, this has everything contained within a single model.

Video understanding allows it to catch temporal details that frame-extraction methods miss.

In terms of coding, it feels as responsive as the GPT-4o model but with better instruction following.

Desktop GUI agent capability is the standout feature here for real-world automation.

It handles 120 minutes of video without losing context, which is just massive for analysis.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips for Qwen3.5-397B-A17B

Expert tips to help you get the most out of Qwen3.5-397B-A17B and achieve better results.

Toggle Thinking Mode

Use the enable_thinking parameter for logic-heavy tasks to activate deep internal reasoning paths.

Leverage Native Search

Enable the search body parameter to verify facts against real-time web data and execute python code.

Optimize Video Prompts

Provide specific timestamp anchors to focus the 1M token context window on the most relevant segments.

Regional Endpoint Selection

Use the dashscope-intl endpoint for users outside mainland China to reduce network latency.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

262K context
$0.60/$2.50/1M
xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
anthropic

Claude Sonnet 4.6

Anthropic

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

1M context
$3.00/$15.00/1M
google

Gemini 3 Flash

Google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

200K context
$5.00/$25.00/1M
zhipu

GLM-4.7

Zhipu (GLM)

GLM-4.7 by Zhipu AI is a flagship 358B MoE model featuring a 200K context window, elite 73.8% SWE-bench performance, and native Deep Thinking for agentic...

200K context
$0.60/$2.20/1M

Frequently Asked Questions About Qwen3.5-397B-A17B

Find answers to common questions about Qwen3.5-397B-A17B