minimax

MiniMax M2.5

MiniMax M2.5 is a SOTA MoE model featuring a 1M context window and elite agentic coding capabilities at disruptive pricing for autonomous agents.

Agentic AIMoE ArchitectureCoding SpecialistCost Efficient
minimax logominimaxMiniMax M-SeriesFebruary 12, 2026
Context
1.0Mtokens
Max Output
128Ktokens
Input Price
$0.30/ 1M
Output Price
$1.20/ 1M
Modality:TextImage
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
62%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). MiniMax M2.5 scored 62% on this benchmark.
HLE
28%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. MiniMax M2.5 scored 28% on this benchmark.
MMLU
85%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. MiniMax M2.5 scored 85% on this benchmark.
MMLU Pro
76.5%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. MiniMax M2.5 scored 76.5% on this benchmark.
SimpleQA
44%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. MiniMax M2.5 scored 44% on this benchmark.
IFEval
87.5%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. MiniMax M2.5 scored 87.5% on this benchmark.
AIME 2025
45%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. MiniMax M2.5 scored 45% on this benchmark.
MATH
72%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. MiniMax M2.5 scored 72% on this benchmark.
GSM8k
95.8%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. MiniMax M2.5 scored 95.8% on this benchmark.
MGSM
92.4%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. MiniMax M2.5 scored 92.4% on this benchmark.
MathVista
65%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. MiniMax M2.5 scored 65% on this benchmark.
SWE-Bench
80.2%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. MiniMax M2.5 scored 80.2% on this benchmark.
HumanEval
89.6%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. MiniMax M2.5 scored 89.6% on this benchmark.
LiveCodeBench
65%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. MiniMax M2.5 scored 65% on this benchmark.
MMMU
68%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. MiniMax M2.5 scored 68% on this benchmark.
MMMU Pro
54%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. MiniMax M2.5 scored 54% on this benchmark.
ChartQA
88%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. MiniMax M2.5 scored 88% on this benchmark.
DocVQA
93.2%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. MiniMax M2.5 scored 93.2% on this benchmark.
Terminal-Bench
52%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. MiniMax M2.5 scored 52% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. MiniMax M2.5 scored 12% on this benchmark.

About MiniMax M2.5

Learn about MiniMax M2.5's capabilities, features, and how it can help you achieve better results.

High-Efficiency Frontier Intelligence

MiniMax M2.5 represents a major breakthrough in the efficiency of frontier-class AI. As a Mixture-of-Experts (MoE) model, it utilizes a sparse architecture with 230 billion total parameters, but only activates 10 billion parameters per token. This design allows it to deliver performance competitive with global flagship models while remaining significantly faster and cheaper to operate. Released in early 2026, it is specifically optimized for "agentic" workloads where AI must plan, execute, and self-correct across multi-step tasks.

Architectural Reasoning and Coding

One of the most distinctive features of M2.5 is its emergent architectural thinking. Unlike standard LLMs that generate code linearly, M2.5 is trained to map out project hierarchies and logic structures before writing files. This capability, combined with a 1-million-token context window, makes it a premier choice for autonomous software engineering, large-scale code reviews, and complex repository management. It supports over 10 programming languages and features native throughput of up to 100 tokens per second.

MiniMax M2.5

Use Cases for MiniMax M2.5

Discover the different ways you can use MiniMax M2.5 to achieve great results.

Agentic Software Engineering

Autonomous generation and testing of multi-file projects within sandbox environments using Architect mode.

High-Precision Office Automation

Executing complex tasks across Word, PowerPoint, and Excel including professional financial modeling.

Autonomous Web Research

Navigating information-dense webpages to perform expert-level information retrieval and synthesis.

Bilingual Technical Support

Providing native-level fluency in both Chinese and English for complex debugging and architectural planning.

3D Simulation Prototyping

Generating functional 3D environments and interactive components like Three.js in a single shot.

Enterprise Code Review

Performing comprehensive code reviews and system testing across 10+ programming languages with architectural oversight.

Strengths

Limitations

Disruptive Cost-Efficiency: At $0.30/$1.20 per 1M tokens, it delivers elite intelligence for a fraction of the price of global competitors.
Occasional Logic Errors: Initial 'one-shot' code can contain functional errors such as logic inconsistencies in complex animations.
Architectural Planning: The model displays a unique ability to map out project hierarchies and logic structures before generating code.
Geographical Latency: Users outside the Asia-Pacific region may experience higher latency without local edge deployment centers.
Extreme Inference Speed: Native serving at 100 TPS makes it one of the fastest frontier-class models for interactive workflows.
World Knowledge Gaps: While technically accurate, it can occasionally struggle with precise alignment to niche real-world objects in 3D generations.
Elite Coding Performance: Specifically optimized for real-world software engineering, achieving 80.2% on SWE-Bench Verified.
Instruction Sensitivity: May ignore 'single-script' constraints for complex tasks unless prompted very specifically to avoid multi-file sprawl.

API Quick Start

minimax/minimax-m2.5

View Documentation
minimax SDK
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MINIMAX_API_KEY,
  baseURL: "https://api.minimax.chat/v1",
});

async function main() {
  const response = await client.chat.completions.create({
    model: "minimax-m2.5",
    messages: [{ role: "user", content: "Plan like an architect and code a 3D Formula 1 car drifting." }],
  });
  console.log(response.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

What People Are Saying About MiniMax M2.5

See what the community thinks about MiniMax M2.5

"MiniMax M2.5 is a top tier coding and agentic model that's much faster and drastically cheaper."
WorldofAI
youtube
"The speed of M2.5 compounds fast in agent loops. It's purpose-built for always-on production workloads."
MarketingNetMind
reddit
"It feels more like a tireless helper than a slow bot. The speed is a real game changer for my setup."
bruckout
reddit
"This looks like a real game changer... cost is one-tenth that of proprietary flagship models."
Techmeme
facebook
"It reaches 80.2% on SWE Bench Verified. This is an order of magnitude shift for agent economics."
jackhnels
x
"The architectural planning mode is finally making autonomous coding agents reliable enough for dev teams."
logic_pro
hackernews

Videos About MiniMax M2.5

Watch tutorials, reviews, and discussions about MiniMax M2.5

Finally makes the idea of intelligence too cheap to meter truly realistic.

The quality is definitely there... remarkably functional even for complex frontend animations.

This model is absolutely eating coding benchmarks for breakfast right now.

Its ability to self-correct during the agent loop is what sets it apart from M2.1.

I haven't seen this level of price-to-performance in any other release this year.

A significant improvement from previous generations is M2.5's ability to think and plan like an architect.

This thing is going to come out as being a very very potent agentic coding tool.

Notice how it breaks down the folder structure before writing the actual React components.

The reasoning capabilities here are punching way above its active parameter weight.

If you're building autonomous dev agents, you need to be testing this model immediately.

If you want to use this for your own workflow, you would probably get pretty good results for coding.

They're certainly not falling behind... they're getting closer in terms of overall performance.

The multimodal vision support handles complex UI wireframes better than some proprietary models.

We're seeing a trend where speed is becoming as important as raw intelligence for agents.

M2.5 represents the maturation of the MiniMax ecosystem for global developers.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips for MiniMax M2.5

Expert tips to help you get the most out of MiniMax M2.5 and achieve better results.

Leverage Architect Mode

Explicitly prompt the model to 'plan like an architect' to trigger its deeper reasoning and file-structure decomposition.

Use Iterative Feedback

For complex 3D or SVG animations, provide feedback on functional errors to leverage the model's agentic self-correction.

Manage Prompt Caching

Take advantage of the 1M context window by caching large documentation sets to reduce costs by up to 90%.

Toggle Lightning Version

Use the Lightning version for real-time interactive UI coding to achieve 100 TPS speeds.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

zhipu

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context
$1.00/$3.20/1M
alibaba

Qwen3-Coder-Next

alibaba

Qwen3-Coder-Next is Alibaba Cloud's elite Apache 2.0 coding model, featuring an 80B MoE architecture and 256k context window for advanced local development.

256K context
$0.14/$0.42/1M
alibaba

Qwen-Image-2.0

alibaba

Qwen-Image-2.0 is Alibaba's unified 7B model for professional infographics, photorealism, and precise image editing with native 2K resolution and 1k-token...

1K context
$0.07/1M
openai

GPT-5.3 Codex

OpenAI

GPT-5.3 Codex is OpenAI's 2026 frontier coding agent, featuring a 400K context window, 77.3% Terminal-Bench score, and superior logic for complex software...

400K context
$1.75/$14.00/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

200K context
$5.00/$25.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

262K context
$0.60/$2.50/1M
deepseek

DeepSeek-V3.2-Speciale

DeepSeek

DeepSeek-V3.2-Speciale is a reasoning-first LLM featuring gold-medal math performance, DeepSeek Sparse Attention, and a 131K context window. Rivaling GPT-5...

131K context
$0.28/$0.42/1M
other

PixVerse-R1

Other

PixVerse-R1 is a next-gen real-time world model by AIsphere, offering interactive 1080p video generation with instant response and physics-aware continuity.

Frequently Asked Questions About MiniMax M2.5

Find answers to common questions about MiniMax M2.5