How much does Kimi K2.5 cost to use?

Input tokens cost $0.60 per million and output tokens cost $3.00 per million. This pricing makes it one of the most affordable frontier-class models available.

What is the maximum context length for Kimi K2.5?

Kimi K2.5 supports a context window of 256,000 tokens. This allows users to process entire books or large codebases in one prompt.

Can Kimi K2.5 process video files?

Yes, it features a native MoonViT-3D encoder for processing long video content. It can analyze hours of footage for events, summaries, and visual details.

Is Kimi K2.5 open source?

Moonshot AI has released the model weights under a modified MIT License. This allows developers to host the model on their own infrastructure.

What is the Agent Swarm feature?

It is an orchestration mode where the model manages up to 100 parallel sub-agents. This is used for tasks that require high-concurrency research or multi-file editing.

How does Kimi K2.5 compare to Claude 3.7 Sonnet?

Kimi K2.5 offers similar reasoning capabilities but includes native video support and significantly lower pricing. It also features the unique parallel swarm architecture.

What hardware is required to run Kimi K2.5 locally?

The full unquantized model requires approximately 632GB of VRAM. Most local users will need to use quantized versions on high-end consumer hardware.

How do I access the Thinking mode via API?

You can enable it by adding a thinking object with type set to enabled in the extra_body of your API call. This improves performance on logic-heavy tasks.

Kimi K2.5

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

Agentic AIMultimodalOpen SourceReasoningMoE

moonshotKimiJanuary 27, 2026

Context

256Ktokens

Max Output

66Ktokens

Input Price

$0.60/ 1M

Output Price

$3.00/ 1M

Modality:TextImageVideo

Capabilities:VisionToolsStreamingReasoning

Benchmarks

GPQA

87.6%

HLE

50.2%

MMLU

91.5%

MMLU Pro

87.1%

SimpleQA

48%

IFEval

85%

AIME 2025

96.1%

MATH

90.1%

GSM8k

97.1%

MGSM

95%

MathVista

90.1%

SWE-Bench

76.8%

HumanEval

88%

LiveCodeBench

85%

MMMU

78.5%

MMMU Pro

78.5%

ChartQA

77.5%

DocVQA

88.8%

Terminal-Bench

50.8%

ARC-AGI

12%

View API Documentation

About Kimi K2.5

Learn about Kimi K2.5's capabilities, features, and how it can help you achieve better results.

Kimi K2.5 is an open-source multimodal model from Moonshot AI. It uses a 1 trillion parameter Mixture-of-Experts architecture where 32 billion parameters are active per token. The system unifies text, image, and video processing through a single reasoning framework rather than using separate external encoders for each modality. This architecture allows the model to handle 256,000 tokens of context while maintaining high retrieval accuracy and logical consistency across very long sequences.

The model is distinguished by its Agent Swarm capability. This feature allows the system to coordinate up to 100 parallel sub-agents to execute complex research or engineering tasks simultaneously. By integrating a 400M parameter MoonViT-3D encoder, K2.5 can analyze several hours of video content with temporal precision. It is specifically designed for autonomous execution, outperforming many proprietary models on agentic benchmarks like SWE-Bench and BrowseComp.

Kimi K2.5 provides a dedicated Thinking mode for tasks requiring deep logic. When enabled, the model generates an internal chain of reasoning to self-correct and verify steps before producing a final answer. This makes it highly effective for competition-level mathematics and large-scale software development. Its token economics are optimized for enterprise deployment, offering frontier-level intelligence at a fraction of the cost of competing closed-source systems.

Use Cases

Discover the different ways you can use Kimi K2.5 to achieve great results.

Autonomous Software Engineering

Solving complex GitHub issues and building multi-file project architectures using SWE-Bench optimized logic.

Visual Web Development

Creating functional frontend code and UI designs directly from screen recordings of existing website interactions.

Multi-Threaded Research

Using Agent Swarm to crawl and synthesize information from over 100 sources in a single parallel workflow.

Long Video Analysis

Extracting specific events and temporal data from hours of security or lecture footage without frame extraction tools.

Mathematical Proof Generation

Applying the deep thinking mode to solve olympiad-level math problems with a 96 percent accuracy rate.

Enterprise Document Automation

Generating multi-page PDF reports and complex financial spreadsheets from unstructured business data sources.

Strengths

Limitations

Elite Agentic Performance: Scores 76.8 on SWE-Bench Verified, outperforming many proprietary frontier models in software engineering tasks.

Extreme Local VRAM Needs: Requires 632GB of VRAM for the full unquantized model, making local deployment impossible for most consumer users.

Unmatched Token Economics: Provides 1T parameter MoE intelligence at $0.60 per million input tokens, roughly 10 percent of the cost of Claude Opus.

Higher Reasoning Latency: Thinking mode can introduce significant delays as the model generates internal logic chains before replying.

Native Video Understanding: Processes complex video files without external frame extraction, enabling precise temporal analysis of long recordings.

Formatting Repetition: May produce excessively long walls of text unless strictly prompted to use specific paragraph structures.

Parallel Swarm Orchestration: The only open model trained to coordinate up to 100 sub-agents for massive, multi-threaded research workflows.

Data Residency Concerns: The primary infrastructure is based in China, which may present compliance issues for certain Western enterprises.

API Quick Start

fireworks/kimi-k2p5

View Documentation

moonshot SDK

import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.KIMI_API_KEY, baseURL: 'https://api.moonshot.cn/v1' });
async function main() {
  const res = await client.chat.completions.create({
    model: 'kimi-k2.5',
    messages: [
      { role: 'system', content: 'You are Kimi, a reasoning agent.' },
      { role: 'user', content: 'Design a parallel research plan for quantum computing trends.' }
    ],
    extra_body: { thinking: { type: 'enabled' } }
  });
  console.log(res.choices[0].message.content);
}
main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Kimi K2.5

“Kimi K2.5 costs almost 10 percent of what Opus costs at a similar performance level.”

— Odd_Tumbleweed574

“People forget Nvidia lost 600 billion dollars when a Chinese lab open sourced something major. Kimi is doing that again with frontier intelligence.”

— chetaslua

twitter

“The Attention Residuals concept in K2.5 is the first architectural change in years that actually fixes the LLM forgetting problem.”

— logic_king

hackernews

“Workers AI runs big models now. Kimi K2.5 first. It is one of the best open source models out there, very good for coding too.”

— dok2001

twitter

“Kimi K2.5 is a different beast. It is a smart incredible RP model, but it can get neurotic if you do not use community presets.”

— dptgreg

“I replaced my GPT 4 workflow with Kimi K2.5 because the thinking mode is more transparent and the context window handles my whole repo.”

— Dev_Max

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents

Web Automation

Smart Workflows

Get Started Free

Pro Tips

Expert tips to help you get the most out of Kimi K2.5 and achieve better results.

Enable Thinking Mode

Pass the thinking parameter in your API request to reach maximum accuracy for math and coding tasks.

Trigger Agent Swarm

Instruct the model to deploy a swarm for research tasks to force parallel orchestration across sub-agents.

Optimize Temperature

Use a temperature of 1.0 for thinking mode to permit diverse reasoning but lower it to 0.6 for standard chat.

Joint Vision Prompts

Upload error screenshots alongside code snippets to leverage the model's unified text-vision training.

Context Caching

Utilize context caching for repeated long documents to reduce input costs by up to 90 percent.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Mohammed Ibrahim

CEO, qannas.pro

Ben Bressington

CTO, AiChatSolutions

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context

$3.00/$15.00/1M

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context

$1.25/$10.00/1M

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context

$5.00/$25.00/1M

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context

$0.25/$1.50/1M

Qwen3.5-397B-A17B

alibaba

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

1M context

$0.40/$2.40/1M

Claude Fable 5

Anthropic

Anthropic's Claude Fable 5 is a Mythos-class model featuring a 1M context window and 128K output tokens. It excels at agentic coding and 3D physics.

1M context

$10.00/$50.00/1M

GLM-5.1

Zhipu (GLM)

GLM-5.1 is Zhipu AI's flagship reasoning model, featuring a 202K context window and an autonomous 8-hour execution loop for complex agentic engineering.

203K context

$1.40/$4.40/1M

Qwen3.6-Max-Preview

alibaba

Qwen3.6-Max-Preview is Alibaba's flagship MoE model featuring 1M context, a native thinking mode, and SOTA scores in agentic coding and reasoning.

1M context

$1.25/$10.00/1M

Frequently Asked Questions

Find answers to common questions about Kimi K2.5

Kimi K2.5

About Kimi K2.5

Use Cases

Autonomous Software Engineering

Visual Web Development

Multi-Threaded Research

Long Video Analysis

Mathematical Proof Generation

Enterprise Document Automation

Strengths

Limitations

API Quick Start

Community Feedback

Related Videos

Supercharge your workflow with AI Automation

Pro Tips

Enable Thinking Mode

Trigger Agent Swarm

Optimize Temperature

Joint Vision Prompts

Context Caching

What Our Users Say

Related AI Models

Grok-4

GPT-5.1

Claude Opus 4.5

Gemini 3.1 Flash-Lite

Qwen3.5-397B-A17B

Claude Fable 5

GLM-5.1

Qwen3.6-Max-Preview

Frequently Asked Questions

How much does Kimi K2.5 cost to use?

What is the maximum context length for Kimi K2.5?

Can Kimi K2.5 process video files?

Is Kimi K2.5 open source?

What is the Agent Swarm feature?

How does Kimi K2.5 compare to Claude 3.7 Sonnet?

What hardware is required to run Kimi K2.5 locally?

How do I access the Thinking mode via API?