alibaba

Qwen3-Coder-Next

Qwen3-Coder-Next is Alibaba Cloud's elite Apache 2.0 coding model, featuring an 80B MoE architecture and 256k context window for advanced local development.

Coding AIOpen WeightsMixture of ExpertsAgentic WorkflowsLocal LLM
alibaba logoalibabaQwen3-CoderFebruary 2, 2026
Context
256Ktokens
Max Output
8Ktokens
Input Price
$0.14/ 1M
Output Price
$0.42/ 1M
Modality:Text
Capabilities:ToolsStreaming
Benchmarks
GPQA
53.4%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Qwen3-Coder-Next scored 53.4% on this benchmark.
HLE
28.5%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Qwen3-Coder-Next scored 28.5% on this benchmark.
MMLU
86.2%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Qwen3-Coder-Next scored 86.2% on this benchmark.
MMLU Pro
78.4%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Qwen3-Coder-Next scored 78.4% on this benchmark.
SimpleQA
48.2%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Qwen3-Coder-Next scored 48.2% on this benchmark.
IFEval
89.1%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Qwen3-Coder-Next scored 89.1% on this benchmark.
AIME 2025
89.2%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Qwen3-Coder-Next scored 89.2% on this benchmark.
MATH
83.5%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Qwen3-Coder-Next scored 83.5% on this benchmark.
GSM8k
95.8%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Qwen3-Coder-Next scored 95.8% on this benchmark.
MGSM
92.5%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Qwen3-Coder-Next scored 92.5% on this benchmark.
MathVista
71.2%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Qwen3-Coder-Next scored 71.2% on this benchmark.
SWE-Bench
74.2%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Qwen3-Coder-Next scored 74.2% on this benchmark.
HumanEval
94.1%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Qwen3-Coder-Next scored 94.1% on this benchmark.
LiveCodeBench
74.5%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Qwen3-Coder-Next scored 74.5% on this benchmark.
MMMU
72.4%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Qwen3-Coder-Next scored 72.4% on this benchmark.
MMMU Pro
58.6%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Qwen3-Coder-Next scored 58.6% on this benchmark.
ChartQA
86.4%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Qwen3-Coder-Next scored 86.4% on this benchmark.
DocVQA
93.5%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Qwen3-Coder-Next scored 93.5% on this benchmark.
Terminal-Bench
58.2%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Qwen3-Coder-Next scored 58.2% on this benchmark.
ARC-AGI
12.5%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Qwen3-Coder-Next scored 12.5% on this benchmark.

About Qwen3-Coder-Next

Learn about Qwen3-Coder-Next's capabilities, features, and how it can help you achieve better results.

Model Overview

Qwen3-Coder-Next is a state-of-the-art open-weight language model designed by Alibaba Cloud's Qwen team, specifically optimized for coding agents and local development environments. Built upon the Qwen3-Next-80B-A3B-Base architecture, it utilizes a sophisticated Mixture-of-Experts (MoE) design with hybrid attention (Gated DeltaNet and Gated Attention). This allows the model to maintain a massive 80-billion-parameter knowledge base while activating only 3 billion parameters per token, resulting in flagship-level reasoning with the inference speed and memory footprint of a much smaller model.

Agentic Specialization

The model represents a shift toward scaling agentic training signals rather than just raw parameter count. It has been trained on over 800,000 verifiable coding tasks paired with executable environments, enabling it to learn directly from environment feedback. This specialized training recipe emphasizes long-horizon reasoning, tool usage, and the ability to recover from execution failures—capabilities that are critical for modern "vibe coding" workflows and autonomous agentic frameworks like OpenClaw.

Local Performance

With a native 256K context window that can extrapolate further, Qwen3-Coder-Next is uniquely positioned as the most powerful local-first coding assistant available. Released under the Apache 2.0 license, it empowers developers to build, debug, and ship entire codebases within a secure, private environment without relying on proprietary cloud APIs.

Qwen3-Coder-Next

Use Cases for Qwen3-Coder-Next

Discover the different ways you can use Qwen3-Coder-Next to achieve great results.

Local Agentic Development

Powering autonomous coding agents that can plan, execute, and debug software locally without sensitive data leaving the machine.

Complex Web Prototyping

Generating functional full-stack applications, including 3D visualizations and interactive games, from single natural language prompts.

Large Repository Analysis

Utilizing the 256K context window to ingest and reason over entire multi-file project structures for refactoring and optimization.

Automated Security Auditing

Scanning codebases for complex vulnerabilities like SQL injection and plaintext credential exposure with grounded fix suggestions.

Technical Research Summarization

Scraping and parsing dense academic or technical documentation to produce organized, actionable HTML reports.

Cross-Language Systems Migration

Translating complex business logic and hardware-specific constraints between different programming languages with high fidelity.

Strengths

Limitations

Exceptional Efficiency: Uses a 3B active parameter MoE architecture to deliver flagship-level coding reasoning at 10x lower inference costs.
Zero-Shot Complexity: Highly complex 3D simulations or architectural tasks often require 2-3 iterative prompts to reach functional perfection.
Elite Agentic Training: Trained on 800K+ verifiable tasks, making it superior at multi-step planning and recovering from execution errors.
Memory Thresholds: The 45GB+ RAM requirement for high-quality quants remains a barrier for many standard developer laptops.
Massive Local Context: The 256K context window is one of the largest available for local models, enabling full-repo reasoning.
Minimalist Aesthetic Bias: Defaults to extremely simple, unstyled UI designs unless specifically prompted for visual flair.
Permissive License: Released under Apache 2.0, allowing developers to fine-tune and deploy without restrictive proprietary licenses.
Modality Restriction: Unlike the VL series, the Coder-Next model is purely text-based and cannot process visual assets directly.

API Quick Start

alibaba/qwen-3-coder-next

View Documentation
alibaba SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

async function main() {
  const completion = await client.chat.completions.create({
    model: 'qwen-3-coder-next',
    messages: [{ role: 'user', content: 'Write a React hook for debouncing a value.' }],
  });

  console.log(completion.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

What People Are Saying About Qwen3-Coder-Next

See what the community thinks about Qwen3-Coder-Next

"This model is incredible for coding and stacks up favorably against the competition"
Becky Jane
youtube
"The architecture allows for a massive context length without ballooning VRAM"
bjan
youtube
"Alibaba is crushing the open-weights game with this MoE architecture"
DevGuru88
reddit
"Finally a local model that handles 256k context without feeling like a snail"
AI_Explorer
x
"I'm seeing a stable ~7.8 tok/s decode on CPU, which is plenty for a local code reviewer"
Express-Jicama-9827
reddit
"Qwen3 Coder is basically the endgame for local development setups."
TechTrend_AI
x

Videos About Qwen3-Coder-Next

Watch tutorials, reviews, and discussions about Qwen3-Coder-Next

We have a 256k context length as well, which is very robust, especially for something that can be run locally.

We have our result at a speed of 26.17 tokens per second... quite a lengthy result.

This is a very exciting model... it shows extreme potential for agentic coding.

The accuracy on Python tasks is just staggering for an open weight model.

I think this model officially kills the need for paid coding assistants for most devs.

It's built on an active 3 billion parameter in a total 80 billion parameters model.

It's not just a coding AI model with 200k context window... it's absolutely intuitive.

For everyday users, you can simply ask it to scrape a web page, analyze content, and generate a clean report.

The way it handles multi-file projects locally is a game changer for privacy.

Function calling feels much more snappy compared to the previous version.

Writing stories at 62 tokens a second. Boom. That was fast.

We are bombing right now... 150 tokens a second with batching... this is amazing.

This car racing game was actually better than the version on Claude... got to give it that.

The MoE architecture really shines when you look at the token-per-watt efficiency.

Quantization doesn't seem to hurt the logic as much as I expected.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips for Qwen3-Coder-Next

Expert tips to help you get the most out of Qwen3-Coder-Next and achieve better results.

Hardware Bandwidth Optimization

For the 80B scale, ensure your system utilizes high-channel memory to prevent inference bottlenecks on CPU-only setups.

Iterative Debugging

Feed the model's own runtime errors back into the prompt; it is specifically trained to recognize execution failures and refine its logic.

Context-Rich Prompting

Maximize the 256K window by providing relevant dependency files and architectural diagrams to reduce hallucinations.

Aesthetic Refinement

When generating UI, explicitly request color and CSS transitions to override the model's default tendency toward minimalist layouts.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

minimax

MiniMax M2.5

minimax

MiniMax M2.5 is a SOTA MoE model featuring a 1M context window and elite agentic coding capabilities at disruptive pricing for autonomous agents.

1M context
$0.30/$1.20/1M
zhipu

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context
$1.00/$3.20/1M
alibaba

Qwen-Image-2.0

alibaba

Qwen-Image-2.0 is Alibaba's unified 7B model for professional infographics, photorealism, and precise image editing with native 2K resolution and 1k-token...

1K context
$0.07/1M
openai

GPT-5.3 Codex

OpenAI

GPT-5.3 Codex is OpenAI's 2026 frontier coding agent, featuring a 400K context window, 77.3% Terminal-Bench score, and superior logic for complex software...

400K context
$1.75/$14.00/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

200K context
$5.00/$25.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

262K context
$0.60/$2.50/1M
deepseek

DeepSeek-V3.2-Speciale

DeepSeek

DeepSeek-V3.2-Speciale is a reasoning-first LLM featuring gold-medal math performance, DeepSeek Sparse Attention, and a 131K context window. Rivaling GPT-5...

131K context
$0.28/$0.42/1M
other

PixVerse-R1

Other

PixVerse-R1 is a next-gen real-time world model by AIsphere, offering interactive 1080p video generation with instant response and physics-aware continuity.

Frequently Asked Questions About Qwen3-Coder-Next

Find answers to common questions about Qwen3-Coder-Next