alibaba

Qwen3-Coder-Next

Qwen3-Coder-Next is Alibaba Cloud's elite Apache 2.0 coding model, featuring an 80B MoE architecture and 256k context window for advanced local development.

Coding AIOpen WeightsMixture of ExpertsAgentic WorkflowsLocal LLM
alibaba logoalibabaQwen3February 3, 2026
Context
262Ktokens
Max Output
8Ktokens
Input Price
$0.12/ 1M
Output Price
$0.75/ 1M
Modality:Text
Capabilities:ToolsStreaming
Benchmarks
GPQA
40.5%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Qwen3-Coder-Next scored 40.5% on this benchmark.
HLE
32%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Qwen3-Coder-Next scored 32% on this benchmark.
MMLU
82.5%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Qwen3-Coder-Next scored 82.5% on this benchmark.
MMLU Pro
65.4%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Qwen3-Coder-Next scored 65.4% on this benchmark.
SimpleQA
42%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Qwen3-Coder-Next scored 42% on this benchmark.
IFEval
85%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Qwen3-Coder-Next scored 85% on this benchmark.
AIME 2025
78%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Qwen3-Coder-Next scored 78% on this benchmark.
MATH
75.9%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Qwen3-Coder-Next scored 75.9% on this benchmark.
GSM8k
91.6%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Qwen3-Coder-Next scored 91.6% on this benchmark.
MGSM
88.5%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Qwen3-Coder-Next scored 88.5% on this benchmark.
SWE-Bench
70.8%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Qwen3-Coder-Next scored 70.8% on this benchmark.
HumanEval
92.7%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Qwen3-Coder-Next scored 92.7% on this benchmark.
LiveCodeBench
52%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Qwen3-Coder-Next scored 52% on this benchmark.
Terminal-Bench
52%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Qwen3-Coder-Next scored 52% on this benchmark.
ARC-AGI
14%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Qwen3-Coder-Next scored 14% on this benchmark.

About Qwen3-Coder-Next

Learn about Qwen3-Coder-Next's capabilities, features, and how it can help you achieve better results.

Model Architecture

Qwen3-Coder-Next is a specialized open-weight model designed by Alibaba Cloud for software engineering agents. It utilizes a Mixture-of-Experts (MoE) architecture with 80 billion total parameters, but it only activates 3 billion parameters per token. This design combines the intelligence of a massive model with the inference speed of a small one. The architecture includes a hybrid attention mechanism, integrating Gated DeltaNet with standard Gated Attention to process contexts up to 262,144 tokens.

Agentic Specialization

The model is trained on over 800,000 verifiable coding tasks and executable environments. This training emphasizes long-horizon reasoning and the ability to recover from execution failures. It scores 70.8% on SWE-Bench Verified, demonstrating its capacity to handle multi-step development tasks from initial planning to final code execution. It excels in autonomous agentic frameworks like OpenClaw and Qwen Code.

Deployment and Privacy

Licensed under Apache 2.0, this model provides a secure alternative for developers who require local, private development environments. It can run on consumer-grade hardware with sufficient RAM through quantization. The high context window allows for repository-scale analysis without the performance degradation typically seen in smaller context models.

Qwen3-Coder-Next

Use Cases

Discover the different ways you can use Qwen3-Coder-Next to achieve great results.

Autonomous Coding Agents

Powers frameworks to handle multi-step development tasks from planning to final execution.

Local Private Development

Runs elite coding assistance on consumer GPUs with 16GB VRAM using quantized MoE layers.

Large-Scale Repository Analysis

Processes entire codebases within its 256k window to identify technical debt.

Code Repair and Refactoring

Updates legacy code to modern standards by providing executable environment feedback.

Multilingual Scripting

Generates high-fidelity code across more than 40 programming languages including Rust and Go.

Interactive 3D Simulation

Builds complex web-based visualizers and simulations using rapid one-shot generation.

Strengths

Limitations

MoE Efficiency: Operates with 3B active parameters for consumer hardware while maintaining 80B-class intelligence.
System RAM Requirements: The 80B total parameter count requires roughly 45GB of total RAM for effective 4-bit quantization.
Agentic Specialization: Scores 70.8% on SWE-Bench Verified, demonstrating superior multi-turn problem-solving.
Recurrent State Limitations: Hybrid attention architecture makes self-speculative decoding unsupported in common inference engines.
Massive Native Context: The 262,144 token window supports repo-scale analysis without performance degradation.
Text-Only Constraints: Lacks multimodal vision capabilities, preventing it from debugging layouts from screenshots.
Permissive Licensing: Released under Apache 2.0, enabling unrestricted commercial use and private local hosting.
High-Complexity Physics: May struggle with one-shot generation of extreme 3D physics logic compared to dense flagship models.

API Quick Start

alibaba/qwen-3-coder-next

View Documentation
alibaba SDK
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});

async function main() {
  const completion = await client.chat.completions.create({
    model: "qwen3-coder-next",
    messages: [
      { role: "system", content: "You are a professional coding assistant." },
      { role: "user", content: "Write a React component for a sortable list." },
    ],
  });
  console.log(completion.choices[0].message.content);
}
main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Qwen3-Coder-Next

Nearly matches Claude in overall coding capabilities. Beats Claude 3.5 Sonnet on HumanEval at 92.7%.
Philipp Schmid
twitter
The efficiency of the MoE version is insane for local hardware. I am getting 26 TPS on a mid-range system.
LocalAI_Dev
reddit
Self-speculative decoding is mathematically impossible for Qwen Coder Next due to recurrent states.
GodComplecs
reddit
Qwen3-Coder-Next is based on MoE, and way stronger and smarter than before!
JustinLin610
twitter
Demonstrating the ability to switch providers mid-project with the new 480B model variants.
saveralter
reddit
The agentic training recipe on 800k tasks shows in the way it recovers from build errors.
TechGurus
hackernews

Related Videos

Watch tutorials, reviews, and discussions about Qwen3-Coder-Next

Allows it to be accessible for folks wanting to play with local AI coding agents

This to me is screaming open code test this model which I will do

The memory efficiency on this thing is a huge win

It handles complex logic better than the previous 72B dense model

This is the first open model that actually follows my terminal commands correctly

Qwen 3 coder Next also has only 3 billion active parameters to run on consumer graphics card

It works beautifully. I am really amazed I can get this result in one shot from local AI

80 billion parameters usually requires a cluster, but the MoE approach changes everything

It handles 40+ programming languages without any noticeable performance drop

Using it with OpenClaw makes it feel like having a junior dev on the team

Three billion parameter model going head-to-head with models 10 to 20 times of its size

Qwen 3 comes with a lot of advantages but with a lower cost

The 256k context is real, it did not hallucinate the middle of my project

The latency is surprisingly low given the 80B total parameter weight

It fixed a bug in my legacy Go repo that GPT-4o missed three times

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Qwen3-Coder-Next and achieve better results.

Use Long System Prompts

Provide the model with detailed examples and documentation to align its agentic behavior.

Iterative Error Feedback

Feed browser console error logs back into the model for high-success rate self-correction.

Optimize Layer Offloading

Offload specific MoE expert layers to system RAM to balance inference speed and reasoning.

Align Sampling Parameters

Use a temperature of 1.0 with top_p 0.95 and top_k 40 for the most accurate coding results.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

openai

GPT-4o mini

OpenAI

OpenAI's most cost-efficient small model, GPT-4o mini offers multimodal intelligence and high-speed performance at a significantly lower price point.

128K context
$0.15/$0.60/1M
zhipu

GLM-4.7

Zhipu (GLM)

GLM-4.7 by Zhipu AI is a flagship 358B MoE model featuring a 200K context window, elite 73.8% SWE-bench performance, and native Deep Thinking for agentic...

200K context
$0.60/$2.20/1M
minimax

MiniMax M2.5

minimax

MiniMax M2.5 is a SOTA MoE model featuring a 1M context window and elite agentic coding capabilities at disruptive pricing for autonomous agents.

1M context
$0.15/$1.20/1M
openai

GPT-5.4

OpenAI

GPT-5.4 is OpenAI's frontier model featuring a 1.05M context window and Extreme Reasoning. It excels at autonomous UI interaction and long-form data analysis.

1M context
$2.50/$15.00/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M
openai

GPT-5.3 Instant

OpenAI

Explore GPT-5.3 Instant, OpenAI's "Anti-Cringe" model. Features a 128K context window, 26.8% fewer hallucinations, and a natural, helpful tone for everyday...

128K context
$1.75/$14.00/1M
google

Gemini 3.1 Pro

Google

Gemini 3.1 Pro is Google's elite multimodal model featuring the DeepThink reasoning engine, a 1M+ context window, and industry-leading ARC-AGI logic scores.

1M context
$2.00/$12.00/1M
anthropic

Claude Sonnet 4.6

Anthropic

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

1M context
$3.00/$15.00/1M

Frequently Asked Questions

Find answers to common questions about Qwen3-Coder-Next