What is the context window for Kimi K2 Thinking?

It supports up to 256,000 tokens. This allows for processing entire books or large repositories of code in a single prompt.

How much does the API cost?

The model costs $0.60 per 1 million input tokens and $2.50 per 1 million output tokens. This is significantly cheaper than competing closed-source reasoning models.

Can I run Kimi K2 Thinking locally?

Yes, the model weights are available on HuggingFace for public download. You will need roughly 245GB of VRAM to run the 1-bit quantized version effectively.

What is unique about its tool use capabilities?

It can handle 200 to 300 tool calls sequentially in one turn. This makes it an expert at autonomous browsing and multi-step agentic tasks.

Does it support multimodal inputs like images?

This specific Thinking variant is text-only. For vision tasks, Moonshot offers the Kimi-VL series which is optimized for multimodal understanding.

How does it compare to OpenAI o1?

K2 Thinking matches o1 in reasoning benchmarks like AIME and MATH. It specifically outperforms o1 on agentic browsing and the HLE benchmark.

Is streaming supported in the API?

Yes, the API supports token-by-token streaming. This is useful for monitoring the model's reasoning process in real-time.

What architecture does it use?

It uses a Mixture-of-Experts architecture with 1 trillion total parameters. Only 32 billion parameters are activated during each inference step.

Kimi K2 Thinking

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

moonshotKimi K2November 6, 2025

Context

256Ktokens

Max Output

16Ktokens

Input Price

$0.60/ 1M

Output Price

$2.50/ 1M

Modality:Text

Capabilities:ToolsStreamingReasoning

Benchmarks

GPQA

84.5%

HLE

44.9%

MMLU

89.4%

MMLU Pro

84.6%

SimpleQA

48%

IFEval

88.3%

AIME 2025

94.5%

MATH

94.1%

GSM8k

98.2%

MGSM

91.5%

MathVista

36.8%

SWE-Bench

71.3%

HumanEval

99%

LiveCodeBench

83.1%

MMMU

65.8%

MMMU Pro

62.4%

ChartQA

86.2%

DocVQA

94.5%

Terminal-Bench

47.1%

ARC-AGI

12.5%

View API Documentation

About Kimi K2 Thinking

Learn about Kimi K2 Thinking's capabilities, features, and how it can help you achieve better results.

Trillion-Parameter Mixture of Experts

Kimi K2 Thinking is a trillion-parameter reasoning model that utilizes a Mixture-of-Experts (MoE) architecture. Developed by Moonshot AI and released in late 2025, it activates only 32B parameters for inference, which balances massive knowledge capacity with computational efficiency. It is designed specifically as a thinking agent that scales its computation during the inference phase to solve complex logical problems. This approach allows the model to reflect on its own reasoning and correct mistakes before providing a final answer.

Agentic Tool Use and Planning

The model distinguishes itself through its capability to handle up to 300 sequential tool calls autonomously. While most standard language models struggle with long-horizon planning, K2 Thinking is engineered for agentic workflows such as autonomous web browsing and multi-step software engineering. It natively supports INT4 precision via Quantization-Aware Training, allowing the model to maintain frontier-level performance while running on standard enterprise hardware clusters.

Developer and Research Focus

With a 256K token context window, the model is built for deep research and complex technical tasks. It bridges the performance gap between closed-source systems and open-weights models. Its ability to solve PhD-level science questions and competitive math problems makes it a suitable choice for academic research, automated coding assistants, and high-fidelity reasoning applications where logical consistency is the primary requirement.

Use Cases

Discover the different ways you can use Kimi K2 Thinking to achieve great results.

Complex Software Engineering

Resolving real GitHub issues and architecting multi-file codebases using iterative self-correction.

Autonomous Research Agents

Executing hundreds of sequential tool calls to gather and synthesize obscure technical data.

Olympiad-Level Mathematics

Solving advanced geometry and algebra problems with deep chain-of-thought verification.

PhD-Level Science Inquiry

Answering expert questions in physics and biology that require multi-step logical deduction.

Interactive Computer Control

Navigating terminal environments and cloud infrastructure to automate devops workflows.

Logic-Heavy Creative Writing

Generating long-form content that requires strict adherence to intricate world-building rules.

Strengths

Limitations

State-of-the-Art Reasoning: Scores 44.9% on HLE with tools, surpassing major closed-source models in expert-level logic.

Massive Resource Requirements: Local inference requires at least 245GB of VRAM even with quantization, limiting its use to high-end server clusters.

Exceptional Agentic Depth: Capable of 300 sequential tool calls, enabling truly autonomous web research and browser tasks.

Inherent Response Latency: The deep thinking process results in significant wait times as the model scales its test-time computation.

Top-Tier Mathematical Accuracy: Achieves 94.5% on AIME 2025, proving its reliability for high-level mathematical problem solving.

Lack of Native Multimodality: This variant cannot process image or video inputs directly, requiring a separate vision model for multimodal tasks.

Open-Weights Accessibility: Offers frontier-level intelligence to the developer community for local deployment and fine-tuning.

High Token Overhead: Internal reasoning steps consume a large number of output tokens, which increases API costs for simple queries.

API Quick Start

moonshot/kimi-k2-thinking

View Documentation

moonshot SDK

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.MOONSHOT_API_KEY,
  baseURL: 'https://api.moonshot.cn/v1',
});

async function main() {
  const response = await client.chat.completions.create({
    model: 'kimi-k2-thinking',
    messages: [{ role: 'user', content: 'Design a system for autonomous code review using 300 tool calls.' }],
  });
  console.log(response.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Kimi K2 Thinking

“Kimi K2.5 is the best open model for coding, they really cooked.”

— npc_gooner

“Moonshot AI just dropped Kimi K2 Thinking. 300 sequential tool calls? That's the future of agentic AI.”

— @tech_trends

twitter

“Kimi released Kimi K2 Thinking, an open-source trillion-parameter reasoning model. This is the real deal.”

— nekofneko

“The fact that it can handle 300 tool calls sequentially opens up entirely new agent workflows.”

— AI Explained

youtube

“Impressive to see an open-source model hitting these numbers. The test-time scaling approach is clearly paying off.”

— jsmith23

hackernews

“Running this model locally is a challenge, but the reasoning depth is unlike anything else in the open weights space.”

— LocalLlamaEnthusiast

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents

Web Automation

Smart Workflows

Get Started Free

Pro Tips

Expert tips to help you get the most out of Kimi K2 Thinking and achieve better results.

Enable Thinking Output

Use the special tokens flag in your inference engine to see the model's internal reasoning steps.

Optimize Temperature

Set the sampling temperature to 1.0 and min_p to 0.01 for the most consistent reasoning flow.

Utilize System Prompts

Start conversations with the official Moonshot AI identity prompt to stabilize the model's behavior.

Scale Test-Time Compute

Allow the model to generate more internal tokens for harder problems to increase accuracy.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Mohammed Ibrahim

CEO, qannas.pro

Ben Bressington

CTO, AiChatSolutions

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

GPT-5.4

OpenAI

GPT-5.4 is OpenAI's frontier model featuring a 1.05M context window and Extreme Reasoning. It excels at autonomous UI interaction and long-form data analysis.

1M context

$2.50/$15.00/1M

Qwen3.5-Omni

alibaba

Qwen3.5-Omni is a natively omnimodal AI by Alibaba Cloud, offering seamless audio-visual reasoning, real-time voice chat, and 256k context for low-latency apps.

256K context

$0.40/$4.80/1M

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context

$1.75/$14.00/1M

Qwen3.6-Max-Preview

alibaba

Qwen3.6-Max-Preview is Alibaba's flagship MoE model featuring 1M context, a native thinking mode, and SOTA scores in agentic coding and reasoning.

1M context

$1.25/$10.00/1M

GLM-5

Zhipu (GLM)

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

200K context

$1.00/$3.20/1M

GLM-5.1

Zhipu (GLM)

GLM-5.1 is Zhipu AI's flagship reasoning model, featuring a 202K context window and an autonomous 8-hour execution loop for complex agentic engineering.

203K context

$1.40/$4.40/1M

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context

$0.25/$1.50/1M

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context

$5.00/$25.00/1M

Frequently Asked Questions

Find answers to common questions about Kimi K2 Thinking

Kimi K2 Thinking

About Kimi K2 Thinking

Trillion-Parameter Mixture of Experts

Agentic Tool Use and Planning

Developer and Research Focus

Use Cases

Complex Software Engineering

Autonomous Research Agents

Olympiad-Level Mathematics

PhD-Level Science Inquiry

Interactive Computer Control

Logic-Heavy Creative Writing

Strengths

Limitations

API Quick Start

Community Feedback

Related Videos

Supercharge your workflow with AI Automation

Pro Tips

Enable Thinking Output

Optimize Temperature

Utilize System Prompts

Scale Test-Time Compute

What Our Users Say

Related AI Models

GPT-5.4

Qwen3.5-Omni

GPT-5.2

Qwen3.6-Max-Preview

GLM-5

GLM-5.1

Gemini 3.1 Flash-Lite

Claude Opus 4.5

Frequently Asked Questions

What is the context window for Kimi K2 Thinking?

How much does the API cost?

Can I run Kimi K2 Thinking locally?

What is unique about its tool use capabilities?

Does it support multimodal inputs like images?

How does it compare to OpenAI o1?

Is streaming supported in the API?

What architecture does it use?