moonshot

Kimi k2.6

Kimi k2.6 is Moonshot AI's 1T-parameter MoE model featuring a 256K context window, native video input, and elite performance in autonomous agentic coding.

ReasoningMultimodalCoding AgentOpen WeightsMoE
moonshot logomoonshotKimiApril 20, 2026
Context
256Ktokens
Max Output
33Ktokens
Input Price
$0.95/ 1M
Output Price
$4.00/ 1M
Modality:TextImageVideo
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
90.5%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). Kimi k2.6 scored 90.5% on this benchmark.
HLE
54%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. Kimi k2.6 scored 54% on this benchmark.
MMLU
86.4%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. Kimi k2.6 scored 86.4% on this benchmark.
MMLU Pro
84.6%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. Kimi k2.6 scored 84.6% on this benchmark.
SimpleQA
43%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. Kimi k2.6 scored 43% on this benchmark.
IFEval
89.8%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. Kimi k2.6 scored 89.8% on this benchmark.
AIME 2025
97.3%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. Kimi k2.6 scored 97.3% on this benchmark.
MATH
98.2%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. Kimi k2.6 scored 98.2% on this benchmark.
GSM8k
97.3%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. Kimi k2.6 scored 97.3% on this benchmark.
MGSM
91.5%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. Kimi k2.6 scored 91.5% on this benchmark.
MathVista
67.1%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. Kimi k2.6 scored 67.1% on this benchmark.
SWE-Bench
80.2%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. Kimi k2.6 scored 80.2% on this benchmark.
HumanEval
92%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. Kimi k2.6 scored 92% on this benchmark.
LiveCodeBench
83.1%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. Kimi k2.6 scored 83.1% on this benchmark.
MMMU
77.3%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. Kimi k2.6 scored 77.3% on this benchmark.
MMMU Pro
75.6%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. Kimi k2.6 scored 75.6% on this benchmark.
ChartQA
87.4%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. Kimi k2.6 scored 87.4% on this benchmark.
DocVQA
94.9%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. Kimi k2.6 scored 94.9% on this benchmark.
Terminal-Bench
60.2%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. Kimi k2.6 scored 60.2% on this benchmark.
ARC-AGI
68.8%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. Kimi k2.6 scored 68.8% on this benchmark.

About Kimi k2.6

Learn about Kimi k2.6's capabilities, features, and how it can help you achieve better results.

Architectural Design and Scale

Kimi k2.6 is a frontier multimodal Mixture-of-Experts (MoE) model featuring a trillion-parameter scale. It uses 32 billion active parameters per token, balancing computational efficiency with high-level cognitive performance. The architecture supports internal chain-of-thought reasoning, where the model generates hidden reasoning steps before outputting a final response. This design allows it to tackle complex, multi-step problems that typically stall standard large language models.

Agentic Intelligence and Coordination

The model is specifically optimized for autonomous software engineering and long-horizon tasks. It can manage Agent Swarms of up to 300 parallel sub-agents, which coordinate to refactor large codebases or manage complex DevOps pipelines. By using native tool calling and visual understanding, Kimi k2.6 operates as an autonomous agent capable of resolving multi-file GitHub issues and creating motion-rich web interfaces from visual references.

Multimodal Capabilities

Native support for video and image inputs distinguishes Kimi k2.6 from many open-weight peers. It processes video files directly to perform scene analysis, bug reproduction, and structured data extraction. The model serves as a visual architect, generating 3D shaders and complex animations using libraries like Three.js and GSAP based on visual descriptions or uploaded mockups.

Kimi k2.6

Use Cases

Discover the different ways you can use Kimi k2.6 to achieve great results.

Autonomous Software Engineering

Resolving complex GitHub issues by coordinating up to 300 parallel sub-agents over 12-hour sessions.

Motion-Rich Frontend Generation

Creating modern web interfaces with WebGL and GSAP shaders from single text or image prompts.

Deep Video Analysis

Analyzing recordings to perform visual bug reproduction, scene description, or structured data extraction.

Agentic Market Research

Executing multi-step web searches and tool calls to synthesize competitive analysis reports from hundreds of sources.

Legacy Code Optimization

Identifying performance bottlenecks in older codebases by analyzing CPU flame graphs and allocation data.

Scientific Problem Solving

Answering graduate-level science and math questions using Python-assisted reasoning and tool verification.

Strengths

Limitations

Superior Agentic Coding: Achieves an 80.2% score on SWE-Bench Verified, placing it among the most capable models for autonomous engineering.
High Local VRAM Requirements: Running the full model locally requires 600GB of VRAM, limiting self-hosting to specialized high-end workstations.
Massive Coordination Scale: Manages 300 parallel sub-agents, allowing it to handle enterprise-level refactoring tasks in a single pass.
Regional API Latency: Infrastructure is optimized for Asia, which can lead to higher response times for users in Western regions.
Native Multimodal Versatility: Supports native video and image inputs, enabling advanced visual-language agent workflows for UI/UX tasks.
Recall Gaps in Long Context: The model can struggle with perfect recall at the extreme edges of its 256,000-token buffer.
Aggressive Pricing Advantage: At $0.95 per million input tokens, it is significantly cheaper than proprietary competitors like Claude 3.7 or GPT-4o.
Restricted Commercial License: The open-weights release uses a modified license requiring specific compliance for large-scale enterprise deployments.

API Quick Start

moonshotai/kimi-k2.6

View Documentation
moonshot SDK
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MOONSHOT_API_KEY,
  baseURL: "https://api.moonshot.ai/v1",
});

async function main() {
  const completion = await client.chat.completions.create({
    model: "kimi-k2.6",
    messages: [
      { role: "system", content: "You are a coding expert." },
      { role: "user", content: "Optimize this Rust function for throughput." }
    ],
    extra_body: { thinking: { type: "enabled" } }
  });

  console.log(completion.choices[0].message.content);
}

main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about Kimi k2.6

Meet Kimi K2.6: Advancing Open-Source Coding. One prompt, 100+ files. 4,000+ tool calls over 12 hours of continuous execution.
@Kimi_Moonshot
twitter
Kimi 2.6 BEATS Opus 4.7 And Is The BEST Open Source Model In The World. It's a very good model at 10x less cost.
@bindureddy
twitter
The pricing delta is the part nobody is pricing in. Kimi K2.6 is 5x cheaper than Sonnet 4.6. The benchmark gap has officially inverted.
@aakashgupta
twitter
I tried it against a bug I had. It resolved it successfully for a little over $1. It was a difficult bug that Sonnet struggled with.
@uworldhits1391
youtube
Kimi K2.6 is transformative, though it has room for recall improvements in ultra-long tasks. Still, 300 parallel agents is insane.
@Radiant-Act4707
reddit
The Kimi K2 series marks the moment where open-source frontier labs are finally rivaling and surpassing closed-source giants.
@zxytim
twitter

Related Videos

Watch tutorials, reviews, and discussions about Kimi k2.6

Kimi K2.6 won't destroy Claude, but it WILL destroy the premium pricing of closed labs.

The agent swarm capability, 300 agents in parallel, is something we haven't seen in open source yet.

The HLE score of 54.0 is the highest we've seen for an open weights model.

One prompt can lead to 12 hours of continuous execution, which is a new frontier for agents.

It handles multi-step tool invocation with a stability that matches the best proprietary models.

The vision model supports native video input, which is a rare feature even in 2026.

It handles multi-step tool invocation with a stable thinking mode that rivals OpenAI's o-series.

For frontend development, the motion-rich generations are significantly better than K2.5.

The 256K context window allows for entire documentation sets to be parsed in one go.

It is one of the first models to show true autonomy in terminal environments.

Pairing K2.6 with the Kimi Code CLI allows for 12+ hour autonomous coding sessions.

It refactored an 8-year-old financial engine and got a 185% throughput gain autonomously.

This is a trillion-parameter model, but active parameters are only 32B, keeping it fast.

The cost savings for developers moving from Claude to Kimi are astronomical.

It resolved a bug in a complex Rust library that had been open for three months.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of Kimi k2.6 and achieve better results.

Enable Tool Use for Reasoning

Benchmarks show the HLE score jumps from 23.9 to 54.0 when the model is allowed external search and computation tools.

Monitor Context Buffer Edges

Recall is most accurate in the first 200,000 tokens of the 256,000-token buffer.

Use Thinking Mode Sparingly

Disable the thinking parameter for simple chat tasks to reduce latency and total token consumption.

Standardize with XML Tags

The model follows instructions more accurately when context and tasks are wrapped in XML tags.

Leverage Native Video Uploads

Use file upload methods rather than base64 encoding for videos over 100MB to avoid request size limits.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

google

Gemini 3 Flash

Google

Gemini 3 Flash is Google's high-speed multimodal model featuring a 1M token context window, elite 90.4% GPQA reasoning, and autonomous browser automation tools.

1M context
$0.50/$3.00/1M
anthropic

Claude Sonnet 4.6

Anthropic

Claude Sonnet 4.6 offers frontier performance for coding and computer use with a massive 1M token context window for only $3/1M tokens.

1M context
$3.00/$15.00/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

1M context
$5.00/$25.00/1M
google

Gemini 3 Pro

Google

Google's Gemini 3 Pro is a multimodal powerhouse featuring a 1M token context window, native video processing, and industry-leading reasoning performance.

1M context
$2.00/$12.00/1M
alibaba

Qwen3.5-397B-A17B

alibaba

Qwen3.5-397B-A17B is Alibaba's flagship open-weight MoE model. It features native multimodal reasoning, a 1M context window, and a 19x decoding throughput...

1M context
$0.40/$2.40/1M
openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M
openai

GPT-5.2 Pro

OpenAI

GPT-5.2 Pro is OpenAI's 2025 flagship reasoning model featuring Extended Thinking for SOTA performance in mathematics, coding, and expert knowledge work.

400K context
$21.00/$168.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

256K context
$0.60/$3.00/1M

Frequently Asked Questions

Find answers to common questions about Kimi k2.6