Kimi K2.5

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

Agentic AIMultimodalOpen SourceReasoningMoE
moonshot logomoonshotKimi K-seriesJanuary 27, 2026
Context
262Ktokens
Max Output
33Ktokens
Input Price
$0.60/ 1M
Output Price
$2.50/ 1M
Modality:TextImageVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
87.6%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Kimi K2.5 scored 87.6% on this benchmark.
HLE
50.2%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Kimi K2.5 scored 50.2% on this benchmark.
MMLU
92%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Kimi K2.5 scored 92% on this benchmark.
MMLU Pro
87.1%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Kimi K2.5 scored 87.1% on this benchmark.
SimpleQA
54%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Kimi K2.5 scored 54% on this benchmark.
IFEval
94%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Kimi K2.5 scored 94% on this benchmark.
AIME 2025
96.1%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Kimi K2.5 scored 96.1% on this benchmark.
MATH
98%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Kimi K2.5 scored 98% on this benchmark.
GSM8k
99%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Kimi K2.5 scored 99% on this benchmark.
MGSM
96%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Kimi K2.5 scored 96% on this benchmark.
MathVista
84.2%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Kimi K2.5 scored 84.2% on this benchmark.
SWE-Bench
76.8%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Kimi K2.5 scored 76.8% on this benchmark.
HumanEval
99%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Kimi K2.5 scored 99% on this benchmark.
LiveCodeBench
85%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Kimi K2.5 scored 85% on this benchmark.
MMMU
84%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Kimi K2.5 scored 84% on this benchmark.
MMMU Pro
78.5%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Kimi K2.5 scored 78.5% on this benchmark.
ChartQA
77.5%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Kimi K2.5 scored 77.5% on this benchmark.
DocVQA
88.8%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Kimi K2.5 scored 88.8% on this benchmark.
Terminal-Bench
50.8%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Kimi K2.5 scored 50.8% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Kimi K2.5 scored 12% on this benchmark.

About Kimi K2.5

Learn about Kimi K2.5's capabilities, features, and how it can help you achieve better results.

A New Frontier in Agentic Intelligence

Kimi K2.5 is a flagship open-source agentic model from Moonshot AI, representing a major leap in unified multimodal intelligence. Built on a massive 1-trillion parameter Mixture-of-Experts (MoE) architecture with 32 billion active parameters, it natively integrates text, image, and video processing into a single reasoning framework. Unlike traditional LLMs, K2.5 is designed specifically for autonomous execution, featuring a unique 'Thinking' mode that allows it to self-correct and reason through complex, multi-step problems without human intervention.

Architectural Breakthroughs

The model introduces a revolutionary feature known as 'Agent Swarm,' which enables the system to dynamically coordinate up to 100 parallel sub-agents to solve massive research or engineering tasks. By achieving top-tier performance on benchmarks like SWE-Bench and AIME 2025, Kimi K2.5 effectively bridges the gap between open-source models and proprietary frontier AI, offering elite capabilities at a fraction of the operational cost. Its integration of the MoonViT-3D encoder allows for unprecedented video understanding, spanning several hours of content with high temporal accuracy.

Unmatched Efficiency

Beyond raw power, K2.5 focuses on sustainable token economics. By utilizing aggressive context caching and a highly optimized MoE structure, it delivers performance that rivals the most expensive proprietary models while maintaining a highly competitive price point of $0.60 per million input tokens. This makes it an ideal backbone for enterprises looking to deploy complex, long-context autonomous agents at scale.

Kimi K2.5

Use Cases for Kimi K2.5

Discover the different ways you can use Kimi K2.5 to achieve great results.

Autonomous Software Engineering

Resolving complex GitHub issues and performing full-stack website cloning from visual UI sketches.

Olympiad-Level Math Solving

Tackling advanced mathematical proofs and competition-level problems with over 96% accuracy on AIME 2025.

Long-Form Video Reasoning

Analyzing and summarizing content from videos up to two hours long without context loss or temporal degradation.

Dynamic Research Agents

Using 'Agent Swarm' to conduct multi-threaded web research and synthesize data from hundreds of sources in parallel.

Aesthetic Frontend Generation

Converting hand-drawn UI wireframes or screenshots into polished, functional React code with expressive motion.

Autonomous Terminal Control

Executing complex bash commands and system-level operations to manage server clusters and development environments.

Strengths

Limitations

Elite Mathematical Reasoning: Scoring 96.1% on AIME 2025, it outperforms nearly all proprietary models in pure logical deduction.
Hardware Intensive: Running the full 1T model locally requires an enterprise-grade AI cluster with multiple H100 or B200 GPUs.
Massive Parallelism: The 'Agent Swarm' capability allows for 100+ sub-agents, drastically reducing time-to-completion for research tasks.
Thinking Latency: Activating the deep reasoning mode significantly increases the time-to-first-token compared to standard processing.
Unified Multimodal Architecture: Natively processes 2-hour videos and high-res images without the need for separate vision encoders.
PhD-Level Knowledge Gap: Its 50.2% on 'Humanity's Last Exam' shows room for improvement in high-level scientific expertise.
Aggressive Token Economics: At $0.60/1M input tokens, it is roughly 8-10x cheaper than comparable frontier models like Claude 4.5.
Regulation Concerns: As a Chinese model, API usage and data sovereignty may be subject to different regulatory frameworks for Western enterprises.

API Quick Start

fireworks/kimi-k2p5

View Documentation
moonshot SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.MOONSHOT_API_KEY,
  baseURL: 'https://api.moonshot.cn/v1'
});

async function main() {
  const response = await client.chat.completions.create({
    model: 'kimi-k2.5',
    messages: [{ role: 'user', content: 'Create a full-stack Next.js dashboard with a dark mode glassmorphism UI.' }],
    max_tokens: 2048,
  });
  console.log(response.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

What People Are Saying About Kimi K2.5

See what the community thinks about Kimi K2.5

"The reasoning capabilities on AIME 2025 are absolutely insane for an open model."
LogicLover
reddit
"Kimi K2.5 just set the new bar for long video understanding. Finally a model that doesn't forget the start of the clip."
AI_Pioneer
x
"Using K2.5 as a coding agent is a game changer. Its SWE-Bench score isn't just a number, you can feel the competence."
DevGuru
hackernews
"China just released Kimi K2.5 and like clockwork the performance is on par with American frontier AI models."
BasedTorba
x
"Kimi from China just destroyed OpenAI's trillion business dream... 8x cheaper."
nrqa__
x
"Kimi K2.5 is the first model that actually feels like a co-pilot rather than just a chat box."
CodeWizard
reddit

Videos About Kimi K2.5

Watch tutorials, reviews, and discussions about Kimi K2.5

Testing the AIME problems, Kimi K2.5 got almost everything right, even the ones GPT-4o struggled with.

For coding tasks, the agentic capabilities are clearly where this model shines compared to standard LLMs.

The open-source nature of a trillion-parameter model like this is unprecedented in the current market.

You're seeing logic processing here that rivaled o1 in my initial math tests.

The token pricing is so low it effectively kills the argument for using proprietary closed models for basic tasks.

The ability to process two-hour videos in one go without losing context is a massive breakthrough.

It's not just a chat model; it's designed from the ground up to use tools and terminals.

When you trigger the Swarm mode, the parallelism for web research is basically unmatched.

This is Moonshot AI putting the world on notice that they have the compute and the talent.

Seeing it navigate a live terminal to fix a bug is the future of autonomous engineering.

Kimi K2.5's jump in the BrowseComp benchmark suggests it can navigate the web with a level of persistence we haven't seen.

The fact that it's unifying vision and thinking modes into one architecture is the real architectural story here.

Performance on MMLU and GSM8k proves that the data quality used for training was top-tier.

Unlike previous versions, the video understanding here doesn't suffer from temporal degradation.

If you're a developer, the OpenAI compatibility makes switching to this model for testing almost zero-effort.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows
Watch demo video

Pro Tips for Kimi K2.5

Expert tips to help you get the most out of Kimi K2.5 and achieve better results.

Leverage Thinking Mode

Explicitly prompt the model with 'Think step-by-step' to activate its reasoning mode for logic-heavy math or coding tasks.

Video Context Advantage

Use the model's MoonViT-3D encoder to process extremely long videos; it excels at finding specific details in 2-hour clips.

Agent Orchestration

For large projects, utilize the swarm capability to let K2.5 break down tasks into sub-tasks for faster execution.

Cache Hit Savings

Structure your API calls to take advantage of Moonshot's aggressive context caching to reduce input costs by up to 75%.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

xai

Grok-4

xai

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
openai

GPT-5.1

openai

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
anthropic

Claude Opus 4.5

anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
zhipu

GLM-4.7

zhipu

GLM-4.7 by Zhipu AI is a flagship 358B MoE model featuring a 200K context window, elite 73.8% SWE-bench performance, and native Deep Thinking for agentic...

200K context
$0.60/$2.20/1M
google

Gemini 3 Flash

google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
anthropic

Claude 3.7 Sonnet

anthropic

Claude 3.7 Sonnet is Anthropic's first hybrid reasoning model, delivering state-of-the-art coding capabilities, a 200k context window, and visible thinking.

200K context
$3.00/$15.00/1M
xai

Grok-3

xai

Grok-3 is xAI's flagship reasoning model, featuring deep logic deduction, a 128k context window, and real-time integration with X for live research and coding.

128K context
$3.00/$15.00/1M
deepseek

DeepSeek-V3.2-Speciale

deepseek

DeepSeek-V3.2-Speciale is a reasoning-first LLM featuring gold-medal math performance, DeepSeek Sparse Attention, and a 131K context window. Rivaling GPT-5...

131K context
$0.28/$0.42/1M

Frequently Asked Questions About Kimi K2.5

Find answers to common questions about Kimi K2.5