zhipu

GLM-4.7

GLM-4.7 by Zhipu AI is a flagship 358B MoE model featuring a 200K context window, elite 73.8% SWE-bench performance, and native Deep Thinking for agentic...

zhipu logozhipuGLMDecember 22, 2025
Context
200Ktokens
Max Output
131Ktokens
Input Price
$0.60/ 1M
Output Price
$2.20/ 1M
Modality:Text
Capabilities:ToolsStreamingReasoning
Benchmarks
GPQA
42%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). GLM-4.7 scored 42% on this benchmark.
HLE
43%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. GLM-4.7 scored 43% on this benchmark.
MMLU
88%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. GLM-4.7 scored 88% on this benchmark.
MMLU Pro
76%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. GLM-4.7 scored 76% on this benchmark.
SimpleQA
45%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. GLM-4.7 scored 45% on this benchmark.
IFEval
89%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. GLM-4.7 scored 89% on this benchmark.
AIME 2025
96%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. GLM-4.7 scored 96% on this benchmark.
MATH
96%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. GLM-4.7 scored 96% on this benchmark.
GSM8k
97%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. GLM-4.7 scored 97% on this benchmark.
MGSM
94%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. GLM-4.7 scored 94% on this benchmark.
MathVista
68%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. GLM-4.7 scored 68% on this benchmark.
SWE-Bench
74%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. GLM-4.7 scored 74% on this benchmark.
HumanEval
92%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. GLM-4.7 scored 92% on this benchmark.
LiveCodeBench
85%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. GLM-4.7 scored 85% on this benchmark.
MMMU
70%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. GLM-4.7 scored 70% on this benchmark.
MMMU Pro
55%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. GLM-4.7 scored 55% on this benchmark.
Terminal-Bench
41%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. GLM-4.7 scored 41% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. GLM-4.7 scored 12% on this benchmark.

About GLM-4.7

Learn about GLM-4.7's capabilities, features, and how it can help you achieve better results.

Model Overview

GLM-4.7 is a flagship large language model developed by Zhipu AI. It utilizes a Mixture-of-Experts (MoE) architecture with 358 billion total parameters. The model is specifically designed to handle complex agentic tasks and long-context reasoning through its unique Preserved Thinking and Interleaved Thinking capabilities. These features allow the model to maintain stable logic and intermediate reasoning states across multi-turn sessions, addressing the context decay common in autonomous workflows.

Performance and Architecture

The model offers an expansive 200,000-token context window combined with a massive 131,072-token output capacity. This makes it suitable for generating entire applications or analyzing extensive documentation in a single pass. Released under the MIT license as an open-weight model, it provides high-performance coding and reasoning at a fraction of the cost of proprietary alternatives.

Integration and Use

It is fully compatible with the OpenAI API format, simplifying integration into existing software ecosystems. Developers use it for high-stakes software engineering tasks, where it achieves a 73.8% score on SWE-bench Verified. Its ability to process and analyze high volumes of technical documentation between English and Chinese with native-level linguistic nuance makes it a versatile tool for international development teams.

GLM-4.7

Use Cases

Discover the different ways you can use GLM-4.7 to achieve great results.

Autonomous Software Engineering

Utilizing the 73.8% SWE-bench capability to autonomously debug, refactor, and implement new features across complex repositories.

High-Capacity Document Synthesis

Leveraging the 131k output limit to generate comprehensive technical manuals or entire book chapters from large datasets.

Long-Horizon Agentic Workflows

Deploying agents that use Preserved Thinking to maintain consistency and logic over hundreds of sequential tasks without losing context.

Bilingual Business Intelligence

Processing and analyzing high-volumes of technical documentation between English and Chinese with native-level linguistic nuance.

Automated UI/UX Code Generation

Generating complete React or Next.js front-end architectures with advanced animations and production-ready styling in a single shot.

Competition-Level Mathematical Solving

Solving complex Olympiad-level math problems and symbolic logic puzzles using the dedicated reasoning-heavy thinking mode.

Strengths

Limitations

Elite Coding Performance: Scores 73.8% on SWE-bench Verified, outperforming almost every open-source model and matching top-tier proprietary APIs.
Text-Only Modality: Unlike Gemini or GPT-4o, GLM-4.7 lacks native vision or audio processing, requiring external models for multimodal tasks.
Massive Output Ceiling: The 131,072-token output limit is one of the highest in the industry, enabling the generation of entire applications in one turn.
Massive Local Requirements: At 358B parameters, running the model locally requires significant hardware (approx. 710GB VRAM), making it inaccessible for consumer GPUs.
Agent-First Architecture: Features Preserved Thinking to maintain logical consistency across long-horizon tasks, solving context decay in autonomous agents.
Occasional Latency Spikes: Users on the personal API tier report periodic slowdowns during peak hours compared to the infrastructure of larger providers.
High Economic Value: Provides frontier-level intelligence at roughly 4 to 7 times lower cost than Western competitors like OpenAI or Anthropic.
Instruction Adherence Quirks: While strong at reasoning, the model sometimes ignores specific file structure constraints in highly complex coding sessions.

API Quick Start

zai/glm-4.7

View Documentation
zhipu SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_ZAI_API_KEY',
  baseURL: 'https://api.z.ai/api/paas/v4/',
});

async function main() {
  const response = await client.chat.completions.create({
    model: 'glm-4.7',
    messages: [{ role: 'user', content: 'Design a scalable React architecture.' }],
    thinking: { type: 'enabled' }
  });
  console.log(response.choices[0].message.content);
}
main();

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about GLM-4.7

GLM-4.7 handles large codebases reliably with its 128k context. It's been surprisingly useful for subagent tasks to save on primary API costs.
IulianHI
reddit
Zhipu AI's GLM-4.7 matches proprietary frontier models like GPT-5.1 High in coding. The Preserved Thinking feature is a huge win for autonomous agents.
Etienne Noumen
youtube
GLM-4.7 continues to be the most intelligent open weights model in the Intelligence Index v4.0, placing ahead of DeepSeek V3.2.
Artificial Analysis
twitter
Chinese models are closing the gap fast in coding utility. This 73% SWE-bench score is no joke for an open weight release.
Epoch AI
hackernews
The reasoning speed is actually quite decent for a model of this size. It handles the complex logic much better than previous iterations.
Bijan Bowen
youtube
GLM-4.7 lands #6 on the AI Index, surpassing Kimi K2. Discover why this $2 model is replacing GPT-5.2 in coding workflows.
TowardsAI
twitter

Related Videos

Watch tutorials, reviews, and discussions about GLM-4.7

The context length here is 200k and the maximum output tokens is 128k which is quite beefy actually.

All right, that is really quite impressive. None of them put in a special feature with that level of complexity.

The reasoning speed is actually quite decent for a model of this size.

It handles the complex logic much better than previous iterations.

This model is a significant step up in terms of logical consistency.

The GLM model actually implemented a better architecture by placing all the mock data in one file.

This one is definitely a huge leap. Those benchmarks are justified by the testing I've done.

It understood the context of the entire project without me needing to remind it.

The coding capability is arguably on par with the best models out there.

You are getting high-end reasoning at a fraction of the cost.

It scored a 73.8 percentage on Swaybench verified, which is absolutely incredible for an open-source model.

You can actually see that it functions and actually works. Whereas the Gemini 3 Pro generation doesn't work at all.

The speed of generation for this level of intelligence is remarkable.

It is clearly designed for developers who need reliable code output.

Zhipu AI has really outdone themselves with the MoE architecture tuning here.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of GLM-4.7 and achieve better results.

Enable Thinking Mode for Logic

Set the thinking parameter to enabled for coding or math tasks to utilize the model's internal reasoning traces and improve accuracy.

Use OpenAI-Compatible SDKs

Integrate GLM-4.7 into existing workflows by using the OpenAI SDK and changing the base URL to the Z.ai endpoint.

Maximize the 131K Output

When generating long-form content, provide a detailed outline first to help the model maintain structural coherence over the massive token limit.

Optimize System Prompts for Agents

Define the Preserved Thinking requirements in the system message to ensure the model reuses reasoning states across multi-turn sessions.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

alibaba

Qwen3-Coder-Next

alibaba

Qwen3-Coder-Next is Alibaba Cloud's elite Apache 2.0 coding model, featuring an 80B MoE architecture and 256k context window for advanced local development.

262K context
$0.12/$0.75/1M
openai

GPT-4o mini

OpenAI

OpenAI's most cost-efficient small model, GPT-4o mini offers multimodal intelligence and high-speed performance at a significantly lower price point.

128K context
$0.15/$0.60/1M
minimax

MiniMax M2.5

minimax

MiniMax M2.5 is a SOTA MoE model featuring a 1M context window and elite agentic coding capabilities at disruptive pricing for autonomous agents.

1M context
$0.15/$1.20/1M
google

Gemini 3.1 Flash Live Preview

Google

Gemini 3.1 Flash Live Preview is Google's ultra-low-latency, audio-to-audio model featuring a 131K context window, high-fidelity multimodal reasoning, and...

131K context
$0.75/$4.50/1M
openai

GPT-5.4

OpenAI

GPT-5.4 is OpenAI's frontier model featuring a 1.05M context window and Extreme Reasoning. It excels at autonomous UI interaction and long-form data analysis.

1M context
$2.50/$15.00/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M
openai

GPT-5.3 Instant

OpenAI

Explore GPT-5.3 Instant, OpenAI's "Anti-Cringe" model. Features a 128K context window, 26.8% fewer hallucinations, and a natural, helpful tone for everyday...

128K context
$1.75/$14.00/1M
google

Gemini 3.1 Pro

Google

Gemini 3.1 Pro is Google's elite multimodal model featuring the DeepThink reasoning engine, a 1M+ context window, and industry-leading ARC-AGI logic scores.

1M context
$2.00/$12.00/1M

Frequently Asked Questions

Find answers to common questions about GLM-4.7