moonshot

Kimi K2.7 Code

Kimi K2.7 Code is a 1T parameter MoE model from Moonshot AI. It features a 262k context window and 30% more efficient reasoning for software engineering.

Coding FlagshipOpen WeightsMoE ArchitectureMultimodal AIReasoning Model
moonshot logomoonshotKimiJune 12, 2026
Context
262Ktokens
Max Output
262Ktokens
Input Price
$0.95/ 1M
Output Price
$4.00/ 1M
Modality:TextImageVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
65.8%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Kimi K2.7 Code scored 65.8% on this benchmark.
HLE
38.2%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Kimi K2.7 Code scored 38.2% on this benchmark.
MMLU
87.2%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Kimi K2.7 Code scored 87.2% on this benchmark.
MMLU Pro
71.4%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Kimi K2.7 Code scored 71.4% on this benchmark.
SimpleQA
52.4%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Kimi K2.7 Code scored 52.4% on this benchmark.
IFEval
88.5%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Kimi K2.7 Code scored 88.5% on this benchmark.
AIME 2025
91.5%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Kimi K2.7 Code scored 91.5% on this benchmark.
MATH
81.3%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Kimi K2.7 Code scored 81.3% on this benchmark.
GSM8k
97.2%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Kimi K2.7 Code scored 97.2% on this benchmark.
MGSM
92.4%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Kimi K2.7 Code scored 92.4% on this benchmark.
MathVista
65.5%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Kimi K2.7 Code scored 65.5% on this benchmark.
SWE-Bench
78.2%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Kimi K2.7 Code scored 78.2% on this benchmark.
HumanEval
94.2%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Kimi K2.7 Code scored 94.2% on this benchmark.
LiveCodeBench
68.5%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Kimi K2.7 Code scored 68.5% on this benchmark.
MMMU
72.4%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Kimi K2.7 Code scored 72.4% on this benchmark.
MMMU Pro
48.2%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Kimi K2.7 Code scored 48.2% on this benchmark.
ChartQA
84.2%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Kimi K2.7 Code scored 84.2% on this benchmark.
DocVQA
90.1%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Kimi K2.7 Code scored 90.1% on this benchmark.
Terminal-Bench
67%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Kimi K2.7 Code scored 67% on this benchmark.
ARC-AGI
12.5%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Kimi K2.7 Code scored 12.5% on this benchmark.

About Kimi K2.7 Code

Learn about Kimi K2.7 Code's capabilities, features, and how it can help you achieve better results.

Trillion Parameter Mixture of Experts

Kimi K2.7 Code is the latest iteration of Moonshot AI's trillion parameter Mixture of Experts (MoE) model. It is optimized for software engineering and agentic automation. The model activates 32 billion parameters per inference step, which balances high intelligence with operational speed. It introduces a refined reasoning mechanism that uses 30 percent fewer tokens for thinking compared to previous versions. Technical problem solving is faster and more cost-effective for multi-turn conversations.

Native Multimodality and Visual Context

This model is natively multimodal and processes text, image, and video inputs. Its 262,144 token context window handles large codebases and complex stack traces. By releasing the model as open weights, Moonshot AI provides an alternative to proprietary frontier models for developers building autonomous AI agents. It maintains consistency across long-horizon coding tasks and translates visual designs into functional code without needing intermediate text descriptions.

Kimi K2.7 Code

Use Cases

Discover the different ways you can use Kimi K2.7 Code to achieve great results.

Autonomous Agentic Coding

Powering multi-step agents that navigate complex file structures and execute multi-file refactors via terminal access.

Visual-to-Code Translation

Converting complex UI designs or architecture diagrams directly into functional front-end or systems code.

Long-Horizon Debugging

Analyzing entire project histories and stack traces within the 262k context window to identify architectural bugs.

3D Scene Synthesis

Generating high-fidelity interactive 3D environments using Three.js or C++ from natural language descriptions.

Video-Based Quality Assurance

Analyzing recorded screen sessions or video demos to identify visual bugs and inconsistent UI transitions.

Legacy Modernization

Automating the migration of aging codebases to modern frameworks by maintaining a consistent chain of thought.

Strengths

Limitations

Top-Tier Coding Benchmarks: Scores 78.2 percent on SWE-bench Verified and 94.2 percent on HumanEval, outperforming most open-weight models.
Inconsistent C++ Formatting: Can require multiple attempts to rewrite large C++ files without introducing minor syntax or formatting errors.
Reasoning Efficiency: Reduces thinking-token overhead by 30 percent compared to previous generations, speeding up complex cycles.
Context Window vs Competitors: While 262k is large, it trails the one million token context windows offered by Google Gemini 2.0.
Native Video Support: One of the few models capable of processing direct video input for UI testing and visual debugging.
Headless Browser Stability: Autonomous QA pipelines using headless Chrome can occasionally hang during long verification steps.
Price-to-Performance Ratio: Delivers GPT-5.5 level performance in coding tasks at a low cost of $0.95 per million input tokens.
3D Physics Precision: Can struggle with realistic gravity or complex friction in generated physics simulations, requiring manual tuning.

API Quick Start

moonshot/kimi-k2.7-code

View Documentation
moonshot SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.KIMI_API_KEY,
  baseURL: 'https://api.moonshot.cn/v1',
});

const response = await client.chat.completions.create({
  model: 'kimi-k2.7-code',
  messages: [{ role: 'user', content: 'Generate a 3D WebGL pendulum sim.' }],
  stream: true,
  extra_body: { preserve_thinking: true }
});

for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Kimi K2.7 Code

Kimi 2.7 ranked 2nd after Fable 5 and before GPT-5 xhigh... Kimi 2.7 is amazingly good.
Przemek Chojecki
twitter
Kimi K2.7 Code just made Kimi K2.6 painfully outdated... it gave the most realistic rendering of water waves!
GMI Cloud
twitter
It is the #1 open weight model on SWE-bench (78.2%) and Terminal-Bench 2.1.
Vals AI
twitter
Kimi-K2.7-Code is now released and open-sourced! Improved coding & agent performance over K2.6.
Kimi.ai
twitter
It handled 50 legal PDFs in one go without breaking a sweat.
ThePromptEngineer
youtube
The price is down from $20/month to $1.5/month with the API. Decent UX.
LocalLLaMA-User
reddit

Related Videos

Watch tutorials, reviews, and discussions about Kimi K2.7 Code

It started thinking much more and much longer.

2.7 delivered better results, faster, but a little bit more expensive in terms of total tokens used.

It went into deeper thinking into longer project implementation until actually succeeding.

It doesn't just output code, it plans the architecture first in its thinking tokens.

The logic in the Python script was flawless compared to the previous 2.6 version.

It has improved token efficiency over Kimi K2.6, reducing thinking token usage by approximately 30%.

The reasoning process is much more direct while keeping the high success rate of the model.

The gap between the two is not insane when you consider that this model is 12.5 times cheaper than Claude Fable.

This model is 12.5 times cheaper than Claude Fable at the current API pricing.

Performance on SWE-bench Verified is top-tier for an open-weight release.

The 256k context window is incredibly stable for multi-file project generation.

It handled the C++ logic without needing external library documentation.

The reasoning process is much more linear now without redundant loops.

It built the entire project structure in 15 minutes including the backend components.

It is the best open-weight model for coding tasks available right now on the market.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Kimi K2.7 Code and achieve better results.

Preserve Thinking Mode

Always enable preserve_thinking in your API calls to ensure the model uses its optimized reasoning chain for logic.

Multimodal Prompting

Provide screenshots of current bugs or UI mockups alongside text instructions to improve the success rate of code generation.

Manage Context Budget

Keep performance critical instructions at the beginning or end of the prompt for the most reliable instruction following.

CLI Integration

Use the official Kimi Code CLI for local development to use the model's native ability to interact with local environments.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

anthropic

Claude 3.7 Sonnet

Anthropic

Claude 3.7 Sonnet is Anthropic's first hybrid reasoning model, delivering state-of-the-art coding capabilities, a 200k context window, and visible thinking.

200K context
$3.00/$15.00/1M
deepseek

DeepSeek-V3.2-Speciale

DeepSeek

DeepSeek-V3.2-Speciale is a reasoning-first LLM featuring gold-medal math performance, DeepSeek Sparse Attention, and a 131K context window. Rivaling GPT-5...

131K context
$0.28/$0.42/1M
google

Gemini 3.5 Flash

Google

Gemini 3.5 Flash is Google's high-speed multimodal model with a 1M context window, optimized for sub-second agentic loops and complex coding tasks.

1M context
$1.50/$9.00/1M
other

MiMo V2.5 Pro

Other

MiMo V2.5 Pro is Xiaomi's open-source 1.02T parameter MoE model featuring a 1M context window, native multimodality, and elite agentic coding performance.

1M context
$1.00/$3.00/1M
anthropic

Claude 4.5 Sonnet

Anthropic

Anthropic's Claude Sonnet 4.5 delivers world-leading coding (77.2% SWE-bench) and a 200K context window, optimized for the next generation of autonomous agents.

200K context
$3.00/$15.00/1M
anthropic

Claude Fable 5

Anthropic

Anthropic's Claude Fable 5 is a Mythos-class model featuring a 1M context window and 128K output tokens. It excels at agentic coding and 3D physics.

1M context
$10.00/$50.00/1M
alibaba

Qwen 3.7 Max

alibaba

Qwen 3.7 Max is Alibaba’s flagship AI model for deep reasoning and autonomous agent tasks, featuring a 256k context window and top-tier coding performance.

256K context
$1.20/$6.00/1M
alibaba

Qwen3.5-Omni

alibaba

Qwen3.5-Omni is a natively omnimodal AI by Alibaba Cloud, offering seamless audio-visual reasoning, real-time voice chat, and 256k context for low-latency apps.

256K context
$0.40/$4.80/1M

Frequently Asked Questions

Find answers to common questions about Kimi K2.7 Code