moonshot

Kimi K2 Thinking

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

moonshot logomoonshotKimi2025-11-06
Context
256Ktokens
Max Output
16Ktokens
Input Price
$0.15/ 1M
Output Price
$0.15/ 1M
Modality:Text
Capabilities:ToolsStreamingReasoning
Benchmarks
GPQA
93%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Kimi K2 Thinking scored 93% on this benchmark.
HLE
44.9%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Kimi K2 Thinking scored 44.9% on this benchmark.
MMLU
90%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Kimi K2 Thinking scored 90% on this benchmark.
MMLU Pro
78%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Kimi K2 Thinking scored 78% on this benchmark.
SimpleQA
55%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Kimi K2 Thinking scored 55% on this benchmark.
IFEval
92%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Kimi K2 Thinking scored 92% on this benchmark.
AIME 2025
99.1%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Kimi K2 Thinking scored 99.1% on this benchmark.
MATH
99.1%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Kimi K2 Thinking scored 99.1% on this benchmark.
GSM8k
99%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Kimi K2 Thinking scored 99% on this benchmark.
MGSM
95%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Kimi K2 Thinking scored 95% on this benchmark.
MathVista
75%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Kimi K2 Thinking scored 75% on this benchmark.
SWE-Bench
71.3%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Kimi K2 Thinking scored 71.3% on this benchmark.
HumanEval
83%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Kimi K2 Thinking scored 83% on this benchmark.
LiveCodeBench
83.1%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Kimi K2 Thinking scored 83.1% on this benchmark.
MMMU
80%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Kimi K2 Thinking scored 80% on this benchmark.
MMMU Pro
60%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Kimi K2 Thinking scored 60% on this benchmark.
ChartQA
88%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Kimi K2 Thinking scored 88% on this benchmark.
DocVQA
94%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Kimi K2 Thinking scored 94% on this benchmark.
Terminal-Bench
55%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Kimi K2 Thinking scored 55% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Kimi K2 Thinking scored 12% on this benchmark.

Try Kimi K2 Thinking Free

Chat with Kimi K2 Thinking for free. Test its capabilities, ask questions, and explore what this AI model can do.

Prompt
Response
moonshot/kimi-k2-thinking

Your AI response will appear here

About Kimi K2 Thinking

Learn about Kimi K2 Thinking's capabilities, features, and how it can help you achieve better results.

Trillion-Parameter Open Intelligence

Kimi K2 Thinking is a groundbreaking trillion-parameter reasoning model from Moonshot AI that has redefined the boundaries of open-source intelligence. Released in November 2025, it utilizes a sophisticated Mixture-of-Experts (MoE) architecture with 1T total parameters—activating only 32B for inference—making it both remarkably powerful and computationally efficient. Unlike standard language models, K2 Thinking is engineered as a "thinking agent," scaling test-time computation to perform deep logical reasoning, planning, and autonomous tool use.

Agentic Prowess and Scalability

The model is particularly renowned for its agentic capabilities, successfully executing up to 300 sequential tool calls without human intervention. This makes it a formidable choice for complex research, competitive programming, and multi-step technical workflows. By natively utilizing INT4 precision via Quantization-Aware Training, Moonshot AI has enabled this massive model to run on accessible hardware clusters while outperforming closed-source giants like GPT-5 and Claude 4.5 in critical reasoning and browsing benchmarks.

Developer-First Architecture

Designed for the global developer community, Kimi K2-Thinking offers unrivaled cost-to-performance metrics. With a massive 256K context window and support for extensive chain-of-thought processing, it bridges the gap between local specialized models and enterprise-grade cloud APIs. Its training methodology focuses on long-horizon planning, allowing the model to reflect, correct, and optimize its outputs iteratively.

Kimi K2 Thinking

Use Cases for Kimi K2 Thinking

Discover the different ways you can use Kimi K2 Thinking to achieve great results.

Autonomous Research

Executing deep-dive web inquiries that require hundreds of sequential tool calls and iterative information verification.

Scientific Problem Solving

Tackling PhD-level mathematics and physics queries using Python tool execution and chain-of-thought processing.

Competitive Programming

Solving high-difficulty algorithmic challenges from platforms like Codeforces and LeetCode with PhD-level accuracy.

Complex Code Debugging

Identifying and fixing logical errors in massive multi-file codebases through exhaustive, high-horizon reasoning steps.

Legal and Compliance Analysis

Reviewing lengthy technical or legal documents across a 256K context window to identify subtle risks or contradictions.

Agentic AI Automation

Powering autonomous agents that can plan, act, reflect, and refine their own outputs for hours without human intervention.

Strengths

Limitations

Agentic Depth: The only open-weights model capable of managing 200–300 sequential tool calls without performance degradation.
Text-Only Input: Currently lacks native multimodal vision support for processing direct image, video, or audio files.
State-of-the-Art Reasoning: Outperforms GPT-5 and Claude 4.5 on Humanity's Last Exam (HLE) and BrowseComp through intensive test-time scaling.
Massive RAM Requirements: Local deployment of the full 1T architecture requires over 500GB of RAM or distributed Mac clusters.
Unrivaled Cost Efficiency: Priced at a flat $0.15/1M tokens, it offers frontier intelligence at a fraction of proprietary API costs.
Initial Token Latency: The intensive internal reasoning phase leads to a slower time-to-first-token compared to non-thinking LLMs.
Native INT4 Optimization: Native quantization via Quantization-Aware Training provides a 2x speed boost for local inference on accessible hardware.
Reasoning Verbosity: The model can generate excessively long chain-of-thought sequences even for relatively straightforward queries.

API Quick Start

moonshot/kimi-k2-thinking

View Documentation
moonshot SDK
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.MOONSHOT_API_KEY,
  baseURL: 'https://api.moonshot.ai/v1',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'kimi-k2-thinking',
    messages: [
      { role: 'system', content: 'You are Kimi, a reasoning AI by Moonshot AI.' },
      { role: 'user', content: 'Solve the Riemann Hypothesis proof verification task.' }
    ],
  });

  console.log(completion.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

What People Are Saying About Kimi K2 Thinking

See what the community thinks about Kimi K2 Thinking

"Kimi K2 Thinking is the best AI model I've ever used... no hallucinations and hundreds of tool calls."
Alex Finn
youtube
"The gap between close and open continues to narrow even as the cost of tokens collapses."
Emad Mostaque
x
"Moonshot K2-Thinking is redefining local intelligent agents with 300 tool calls."
Brian Roemmele
x
"Finally a model that actually thinks through the prompt logic before answering!"
ai_user_2025
reddit
"China is really pushing the open-source open weights frontier with the Kimi series."
Nathan Lambert
x
"Absolutely mind-blowing performance on competitive math problems."
MathWizard
hackernews

Videos About Kimi K2 Thinking

Watch tutorials, reviews, and discussions about Kimi K2 Thinking

This is the most agentic independent model ever made.

It is able to think and reflect every single step of the way. So it never gets lost.

It's extremely cost effective... half the price of chat GBT5 and about a tenth of the price of Sonnet 4.5.

It manages to avoid the common logic traps of standard LLMs.

Moonshot is really changing the game for open-weight accessibility.

It can execute up to 200 to 300 sequential tool calls without human interference.

K2 thinking achieved a score of 60.2% significantly outperforming the human baseline of 29.2% on BrowseComp.

China is really pushing the open-source open weights frontier.

The Mixture-of-Experts implementation here is incredibly efficient for 1 trillion parameters.

You get frontier-level reasoning for basically pennies on the dollar.

I've got it running here on a Mac Studio using pseudo cis control wired limit.

We're using up 500 GB of RAM. Our processing speed has come to a crawl around 6.9 tokens a second.

It actually wrote this code down, but it didn't actually stop. It started thinking again.

Even with quantization, the logical coherence of this model remains elite.

The internal monologue shows exactly where it corrects its own coding errors.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows
Watch demo video

Pro Tips

Expert tips to help you get the most out of this model and achieve better results.

Enable Thinking Tags

When running locally via tools like llama.cpp, ensure you use the --special flag to correctly render internal <think> tokens.

Optimize Temperature

Set temperature to 1.0 and min_p to 0.01 for the most stable and rigorous reasoning results.

Hardware Clustering

Deploy the INT4 quantized version on a cluster of two Mac Studio M3 Ultras with RDMA for a lossless 1T local experience.

Long-Horizon Planning

Structure prompts to explicitly ask for a 'step-by-step plan' first to trigger the model's adaptive learning and search strengths.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

openai

GPT-5.2

openai

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context
$1.75/$14.00/1M
openai

GPT-5.2 Pro

openai

GPT-5.2 Pro is OpenAI's 2025 flagship reasoning model featuring Extended Thinking for SOTA performance in mathematics, coding, and expert knowledge work.

400K context
$21.00/$168.00/1M
google

Gemini 3 Pro

google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
google

Gemini 3 Flash

google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
openai

GPT-5.1

openai

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
xai

Grok-4

xai

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
anthropic

Claude Opus 4.5

anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
zhipu

GLM-4.7

zhipu

GLM-4.7 by Zhipu AI is a flagship 358B MoE model featuring a 200K context window, elite 73.8% SWE-bench performance, and native Deep Thinking for agentic...

200K context
$0.60/$2.20/1M

Frequently Asked Questions

Find answers to common questions about this model