alibaba

Qwen3.6-Max-Preview

Qwen3.6-Max-Preview is Alibaba's flagship MoE model featuring 1M context, a native thinking mode, and SOTA scores in agentic coding and reasoning.

MoEAgentic Coding1M ContextFrontier ModelAlibaba Qwen
alibaba logoalibabaQwen 3.6April 20, 2026
Context
1.0Mtokens
Max Output
8Ktokens
Input Price
$1.25/ 1M
Output Price
$10.00/ 1M
Modality:TextImageVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
86%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Qwen3.6-Max-Preview scored 86% on this benchmark.
HLE
51%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Qwen3.6-Max-Preview scored 51% on this benchmark.
MMLU
83%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Qwen3.6-Max-Preview scored 83% on this benchmark.
MMLU Pro
79%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Qwen3.6-Max-Preview scored 79% on this benchmark.
SimpleQA
52%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Qwen3.6-Max-Preview scored 52% on this benchmark.
IFEval
75%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Qwen3.6-Max-Preview scored 75% on this benchmark.
AIME 2025
93%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Qwen3.6-Max-Preview scored 93% on this benchmark.
MATH
95%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Qwen3.6-Max-Preview scored 95% on this benchmark.
GSM8k
98%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Qwen3.6-Max-Preview scored 98% on this benchmark.
MGSM
92%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Qwen3.6-Max-Preview scored 92% on this benchmark.
MathVista
86%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Qwen3.6-Max-Preview scored 86% on this benchmark.
SWE-Bench
73%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Qwen3.6-Max-Preview scored 73% on this benchmark.
HumanEval
91%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Qwen3.6-Max-Preview scored 91% on this benchmark.
LiveCodeBench
79%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Qwen3.6-Max-Preview scored 79% on this benchmark.
MMMU
82%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Qwen3.6-Max-Preview scored 82% on this benchmark.
MMMU Pro
75%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Qwen3.6-Max-Preview scored 75% on this benchmark.
ChartQA
85%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Qwen3.6-Max-Preview scored 85% on this benchmark.
DocVQA
89%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Qwen3.6-Max-Preview scored 89% on this benchmark.
Terminal-Bench
65%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Qwen3.6-Max-Preview scored 65% on this benchmark.
ARC-AGI
14%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Qwen3.6-Max-Preview scored 14% on this benchmark.

About Qwen3.6-Max-Preview

Learn about Qwen3.6-Max-Preview's capabilities, features, and how it can help you achieve better results.

Qwen3.6-Max-Preview is the flagship proprietary large language model from Alibaba, representing the next step in their high-performance AI series. Utilizing a sparse Mixture-of-Experts (MoE) architecture, the model achieves the reasoning depth of a trillion-parameter system while maintaining significant operational efficiency. It is specifically optimized for agentic coding, world knowledge, and complex instruction following.

The model's standout feature is its native Thinking Mode, which allows the system to generate a visible internal chain-of-thought before delivering a final response. This transparency is particularly valuable for developers building autonomous agents, as it provides a clear window into logical planning and error-correction steps. Combined with a massive 1-million-token context window, the model can ingest entire project repositories or extensive documentation libraries in a single pass.

Hosted on Alibaba Cloud Model Studio, Qwen3.6-Max-Preview supports industry-standard protocols and is compatible with OpenAI-style API specifications. It is designed to be the primary choice for enterprises requiring frontier-level AI capabilities for multimodal data analysis and robust agentic workflows, offering a high-performance alternative to Western closed-source models.

Qwen3.6-Max-Preview

Use Cases

Discover the different ways you can use Qwen3.6-Max-Preview to achieve great results.

Autonomous Software Engineering

Deploy the model as a coding agent capable of navigating entire codebases, planning architectural changes, and fixing bugs across multiple files.

Large-Scale Technical Analysis

Utilize the 1M token context window to ingest complete documentation sets or legal frameworks for deep-dive analysis without RAG limitations.

Complex Reasoning and Planning

Leverage the native Thinking Mode to solve high-level mathematical problems where a multi-step internal plan is required for accuracy.

Multimodal Content Understanding

Analyze both static imagery and complex video sequences to extract data and summarize dynamic visual events.

Interactive Terminal Operations

Build tools that allow the AI to interact directly with shells and CLI environments, benefiting from its optimized Terminal-Bench performance.

Enterprise Agentic Workflows

Integrate the model into complex business pipelines where high instructional reliability and sophisticated tool-calling are required for automation.

Strengths

Limitations

World-Leading Coding Ability: Achieves a 57.3% score on SWE-bench Pro, surpassing major frontier models like Claude 4.5 Opus for autonomous software tasks.
Closed Source Restriction: Unlike the Medium versions of Qwen 3.6, the Max-Preview is proprietary and cannot be self-hosted on local hardware.
Enormous 1M Token Context: Handles massive datasets and full technical libraries within a single prompt without the typical context degradation of older architectures.
High Output Token Premium: The $10.00/1M output pricing is an 8x markup over the input price, making long reasoning chains more expensive than ingestion.
Transparent Native Reasoning: The built-in Thinking Mode exposes internal logic, allowing for higher reliability in complex problem-solving and easier debugging.
Knowledge Cutoff Constraints: As a static preview model, it lacks real-time awareness of events or library updates beyond its early 2026 training cutoff.
Aggressive Price Positioning: At $1.25 per million input tokens, it offers frontier-level performance at a fraction of the cost of Western proprietary equivalents.
Regional API Latency: Depending on the deployment region, international users may face higher latency compared to highly optimized local variants.

API Quick Start

alibaba/qwen3.6-max-preview

View Documentation
alibaba SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  base_url: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1',
});

async function main() {
  const completion = await client.chat.completions.create({
    model: 'qwen3.6-max-preview',
    messages: [{ role: 'user', content: 'Design a system architecture for a real-time AI agent.' }],
    extra_body: { enable_thinking: true },
    stream: true
  });

  for await (const chunk of completion) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }
}

main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Qwen3.6-Max-Preview

The kind of performance you'd expect from a model running on a massive server farm is now sitting on your desktop.
softtechhubus
reddit
Qwen3.6-Max-Preview just beat Claude Opus 4.5 on SWE-Bench Pro. China is catching up fast.
BridgeMind
twitter
At $1.25 per million tokens, Qwen is significantly cheaper than Claude for large scale data ingestion.
TechReviewer2026
reddit
The fact that Thinking Mode is baked in as the default state is a meaningful design choice for agentic reliability.
DevGuru
twitter
Qwen has launched Qwen 3.6 Max Preview as a new top-end proprietary flagship model.
AICodeKing
youtube
It shows improved agentic coding and better real-world agent reliability compared to the Plus model.
Codedigipt
youtube

Related Videos

Watch tutorials, reviews, and discussions about Qwen3.6-Max-Preview

Qwen has launched Qwen 3.6 Max Preview as a new top-end proprietary flagship model.

The model shows a strong jump in coding-agent benchmarks like SkillsBench and Terminal-Bench 2.0.

Qwen is clearly trying to compete seriously at the high end against models like Claude 4.5 Opus.

This model represents a meaningful improvement in world knowledge and instruction following.

The performance jump on SWE-bench is what really sets this apart from the Plus variant.

The benchmark story is really about positioning the hosted Max Preview as distinct from the open-weight family.

We use Qwen Code pages and repo surfaces to judge the ecosystem depth beyond just the model weights.

The thinking mode is surprisingly fast compared to o1-style models from last year.

This is clearly designed for enterprise developers who need a reliable API for agentic tasks.

The multimodal vision performance is catching up to Gemini 2 in some document analysis tests.

This video introduces the Qwen3.6-Max-Preview, an early look at the next flagship model from Qwen.

It shows improved agentic coding and better real-world agent reliability compared to the Plus model.

The 1M context window is much more stable than what we saw in early Qwen 2 versions.

If you are doing a lot of coding, Qwen 3.6 Max is currently the benchmark leader.

Pricing remains very competitive even for their flagship closed-source model.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Qwen3.6-Max-Preview and achieve better results.

Enable Internal Reasoning

Set the 'enable_thinking' parameter to true in your API request to view the model's internal logic for debugging complex reasoning.

Preserve Long-Horizon Logic

Use the 'preserve_thinking' feature for multi-turn conversations to ensure the model maintains logical consistency across a session.

Feed Entire Libraries

Take advantage of the 1M context window by providing full source materials instead of chunked data for better cross-file understanding.

Use Compatible Endpoints

For global applications, use the Singapore or US Virginia endpoints in Alibaba Cloud to minimize regional latency for international users.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

zhipu

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context
$1.00/$3.20/1M
zhipu

GLM-5.1

Zhipu (GLM)

GLM-5.1 is Zhipu AI's flagship reasoning model, featuring a 202K context window and an autonomous 8-hour execution loop for complex agentic engineering.

203K context
$1.40/$4.40/1M
openai

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context
$1.75/$14.00/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M
anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
moonshot

Kimi K2 Thinking

Moonshot

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

256K context
$0.60/$2.50/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

256K context
$0.60/$3.00/1M

Frequently Asked Questions

Find answers to common questions about Qwen3.6-Max-Preview