moonshot

Kimi K2.5

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

Agentic AIMultimodalOpen SourceReasoningMoE
moonshot logomoonshotKimiJanuary 27, 2026
Context
256Ktokens
Max Output
66Ktokens
Input Price
$0.60/ 1M
Output Price
$3.00/ 1M
Modality:TextImageVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
87.6%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Kimi K2.5 scored 87.6% on this benchmark.
HLE
50.2%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Kimi K2.5 scored 50.2% on this benchmark.
MMLU
91.5%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Kimi K2.5 scored 91.5% on this benchmark.
MMLU Pro
87.1%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Kimi K2.5 scored 87.1% on this benchmark.
SimpleQA
48%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Kimi K2.5 scored 48% on this benchmark.
IFEval
85%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Kimi K2.5 scored 85% on this benchmark.
AIME 2025
96.1%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Kimi K2.5 scored 96.1% on this benchmark.
MATH
90.1%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Kimi K2.5 scored 90.1% on this benchmark.
GSM8k
97.1%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Kimi K2.5 scored 97.1% on this benchmark.
MGSM
95%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Kimi K2.5 scored 95% on this benchmark.
MathVista
90.1%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Kimi K2.5 scored 90.1% on this benchmark.
SWE-Bench
76.8%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Kimi K2.5 scored 76.8% on this benchmark.
HumanEval
88%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Kimi K2.5 scored 88% on this benchmark.
LiveCodeBench
85%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Kimi K2.5 scored 85% on this benchmark.
MMMU
78.5%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Kimi K2.5 scored 78.5% on this benchmark.
MMMU Pro
78.5%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Kimi K2.5 scored 78.5% on this benchmark.
ChartQA
77.5%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Kimi K2.5 scored 77.5% on this benchmark.
DocVQA
88.8%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Kimi K2.5 scored 88.8% on this benchmark.
Terminal-Bench
50.8%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Kimi K2.5 scored 50.8% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Kimi K2.5 scored 12% on this benchmark.

About Kimi K2.5

Learn about Kimi K2.5's capabilities, features, and how it can help you achieve better results.

Kimi K2.5 is an open-source multimodal model from Moonshot AI. It uses a 1 trillion parameter Mixture-of-Experts architecture where 32 billion parameters are active per token. The system unifies text, image, and video processing through a single reasoning framework rather than using separate external encoders for each modality. This architecture allows the model to handle 256,000 tokens of context while maintaining high retrieval accuracy and logical consistency across very long sequences.

The model is distinguished by its Agent Swarm capability. This feature allows the system to coordinate up to 100 parallel sub-agents to execute complex research or engineering tasks simultaneously. By integrating a 400M parameter MoonViT-3D encoder, K2.5 can analyze several hours of video content with temporal precision. It is specifically designed for autonomous execution, outperforming many proprietary models on agentic benchmarks like SWE-Bench and BrowseComp.

Kimi K2.5 provides a dedicated Thinking mode for tasks requiring deep logic. When enabled, the model generates an internal chain of reasoning to self-correct and verify steps before producing a final answer. This makes it highly effective for competition-level mathematics and large-scale software development. Its token economics are optimized for enterprise deployment, offering frontier-level intelligence at a fraction of the cost of competing closed-source systems.

Kimi K2.5

Use Cases

Discover the different ways you can use Kimi K2.5 to achieve great results.

Autonomous Software Engineering

Solving complex GitHub issues and building multi-file project architectures using SWE-Bench optimized logic.

Visual Web Development

Creating functional frontend code and UI designs directly from screen recordings of existing website interactions.

Multi-Threaded Research

Using Agent Swarm to crawl and synthesize information from over 100 sources in a single parallel workflow.

Long Video Analysis

Extracting specific events and temporal data from hours of security or lecture footage without frame extraction tools.

Mathematical Proof Generation

Applying the deep thinking mode to solve olympiad-level math problems with a 96 percent accuracy rate.

Enterprise Document Automation

Generating multi-page PDF reports and complex financial spreadsheets from unstructured business data sources.

Strengths

Limitations

Elite Agentic Performance: Scores 76.8 on SWE-Bench Verified, outperforming many proprietary frontier models in software engineering tasks.
Extreme Local VRAM Needs: Requires 632GB of VRAM for the full unquantized model, making local deployment impossible for most consumer users.
Unmatched Token Economics: Provides 1T parameter MoE intelligence at $0.60 per million input tokens, roughly 10 percent of the cost of Claude Opus.
Higher Reasoning Latency: Thinking mode can introduce significant delays as the model generates internal logic chains before replying.
Native Video Understanding: Processes complex video files without external frame extraction, enabling precise temporal analysis of long recordings.
Formatting Repetition: May produce excessively long walls of text unless strictly prompted to use specific paragraph structures.
Parallel Swarm Orchestration: The only open model trained to coordinate up to 100 sub-agents for massive, multi-threaded research workflows.
Data Residency Concerns: The primary infrastructure is based in China, which may present compliance issues for certain Western enterprises.

API Quick Start

fireworks/kimi-k2p5

View Documentation
moonshot SDK
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.KIMI_API_KEY, baseURL: 'https://api.moonshot.cn/v1' });
async function main() {
  const res = await client.chat.completions.create({
    model: 'kimi-k2.5',
    messages: [
      { role: 'system', content: 'You are Kimi, a reasoning agent.' },
      { role: 'user', content: 'Design a parallel research plan for quantum computing trends.' }
    ],
    extra_body: { thinking: { type: 'enabled' } }
  });
  console.log(res.choices[0].message.content);
}
main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Kimi K2.5

Kimi K2.5 costs almost 10 percent of what Opus costs at a similar performance level.
Odd_Tumbleweed574
reddit
People forget Nvidia lost 600 billion dollars when a Chinese lab open sourced something major. Kimi is doing that again with frontier intelligence.
chetaslua
twitter
The Attention Residuals concept in K2.5 is the first architectural change in years that actually fixes the LLM forgetting problem.
logic_king
hackernews
Workers AI runs big models now. Kimi K2.5 first. It is one of the best open source models out there, very good for coding too.
dok2001
twitter
Kimi K2.5 is a different beast. It is a smart incredible RP model, but it can get neurotic if you do not use community presets.
dptgreg
reddit
I replaced my GPT 4 workflow with Kimi K2.5 because the thinking mode is more transparent and the context window handles my whole repo.
Dev_Max
reddit

Related Videos

Watch tutorials, reviews, and discussions about Kimi K2.5

Kimmy K2.5 beating GPT 5.2 with high thinking, absolutely destroying the other Frontier models.

It is the strongest open source coding model to date with 76.8 on SWE verified.

Agent swarm is a shift from single agent to multi agent executing parallel workflows across up to 1500 coordinated steps.

The context window is massive at 256k tokens which is plenty for most projects.

Moonshot is really pushing the boundaries of what open weights can do in early 2026.

It really nailed the whole Apple design aesthetic and produced a nice looking website with animations just from a video.

The Swarm feature looks very cool and it is definitely fun to use as it assigns ID badges to each sub agent.

K2.5 is much cheaper at 60 cents per million input tokens and 3 dollars per million output tokens.

The native video processing means you don't have to use expensive external tools to process frames.

This model is a game changer for developers who need autonomous agents on a budget.

Moonshot achieved this by giving each sub agent rewards at separate critical step stages to prevent serial collapse.

The model learns to choose parallelism only when it shortens this critical path, which is very clever innovation.

Kimi K2.5 is just around the edge of being able to run this on consumer hardware using GGUF.

The thinking mode is incredibly robust for solving complex logical errors in Python.

Seeing a 1 trillion parameter model released like this is huge for the open source community.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Kimi K2.5 and achieve better results.

Enable Thinking Mode

Pass the thinking parameter in your API request to reach maximum accuracy for math and coding tasks.

Trigger Agent Swarm

Instruct the model to deploy a swarm for research tasks to force parallel orchestration across sub-agents.

Optimize Temperature

Use a temperature of 1.0 for thinking mode to permit diverse reasoning but lower it to 0.6 for standard chat.

Joint Vision Prompts

Upload error screenshots alongside code snippets to leverage the model's unified text-vision training.

Context Caching

Utilize context caching for repeated long documents to reduce input costs by up to 90 percent.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M
alibaba

Qwen3.5-397B-A17B

alibaba

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

1M context
$0.40/$2.40/1M
zhipu

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context
$1.00/$3.20/1M
openai

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context
$1.75/$14.00/1M
anthropic

Claude Sonnet 4.6

Anthropic

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

1M context
$3.00/$15.00/1M

Frequently Asked Questions

Find answers to common questions about Kimi K2.5