zhipu

GLM-4.7

GLM-4.7 by Zhipu AI is a flagship 358B MoE model featuring a 200K context window, elite 73.8% SWE-bench performance, and native Deep Thinking for agentic...

zhipu logozhipuGLMDecember 22, 2025
Context
200Ktokens
Max Output
131Ktokens
Input Price
$0.60/ 1M
Output Price
$2.20/ 1M
Modality:TextImage
Capabilities:VisionToolsStreamingReasoning
Benchmarks
GPQA
85.7%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). GLM-4.7 scored 85.7% on this benchmark.
HLE
42.8%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. GLM-4.7 scored 42.8% on this benchmark.
MMLU
90.1%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. GLM-4.7 scored 90.1% on this benchmark.
MMLU Pro
84.3%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. GLM-4.7 scored 84.3% on this benchmark.
SimpleQA
46%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. GLM-4.7 scored 46% on this benchmark.
IFEval
88%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. GLM-4.7 scored 88% on this benchmark.
AIME 2025
95.7%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. GLM-4.7 scored 95.7% on this benchmark.
MATH
92%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. GLM-4.7 scored 92% on this benchmark.
GSM8k
98%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. GLM-4.7 scored 98% on this benchmark.
MGSM
94%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. GLM-4.7 scored 94% on this benchmark.
MathVista
74%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. GLM-4.7 scored 74% on this benchmark.
SWE-Bench
73.8%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. GLM-4.7 scored 73.8% on this benchmark.
HumanEval
94.2%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. GLM-4.7 scored 94.2% on this benchmark.
LiveCodeBench
84.9%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. GLM-4.7 scored 84.9% on this benchmark.
MMMU
74.2%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. GLM-4.7 scored 74.2% on this benchmark.
MMMU Pro
58%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. GLM-4.7 scored 58% on this benchmark.
ChartQA
86%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. GLM-4.7 scored 86% on this benchmark.
DocVQA
93%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. GLM-4.7 scored 93% on this benchmark.
Terminal-Bench
41%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. GLM-4.7 scored 41% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. GLM-4.7 scored 12% on this benchmark.

Try GLM-4.7 Free

Chat with GLM-4.7 for free. Test its capabilities, ask questions, and explore what this AI model can do.

Prompt
Response
zhipu/glm-4-7

Your AI response will appear here

About GLM-4.7

Learn about GLM-4.7's capabilities, features, and how it can help you achieve better results.

GLM-4.7 is the latest flagship AI model from Zhipu AI, representing a significant leap in open-weight intelligence. This massive 358-billion parameter Mixture-of-Experts (MoE) model is specifically engineered for advanced reasoning, coding automation, and complex agentic workflows. It introduces a dedicated Deep Thinking mode that enables multi-step planning and error recovery, allowing the model to solve high-stakes software engineering tasks with unprecedented reliability.

The model distinguishes itself through exceptional technical performance, achieving a state-of-the-art 73.8% score on SWE-bench Verified and an 84.9 on LiveCodeBench v6. With its 200,000-token context window and massive 131,072-token output capacity, GLM-4.7 is optimized for generating entire applications and conducting deep research across vast datasets.

As an open-weight release under the MIT license, it offers a powerful and flexible alternative to proprietary APIs, supporting both cloud-based integration and local hosting. Its multimodal capabilities extend to advanced UI design and document analysis, making it a versatile powerhouse for modern AI-driven development.

GLM-4.7

Use Cases for GLM-4.7

Discover the different ways you can use GLM-4.7 to achieve great results.

Agentic Software Engineering

Resolving complex GitHub issues and implementing full-stack features autonomously across entire repositories.

High-Fidelity Vibe Coding

Rapidly generating modern, production-ready web interfaces using Tailwind CSS and interactive Framer Motion components.

Multilingual Technical Support

Providing advanced coding assistance and logical problem solving across 10+ international programming environments.

Deep Academic Research

Analyzing massive document sets to extract multi-hop, verifiable information using the BrowseComp search framework.

Automated Presentation Design

Creating structured, visually balanced slides with accurate layouts and typography from single-sentence prompts.

Terminal-Based Automation

Executing complex system administration and DevOps tasks directly within a terminal sandbox with 41% benchmark accuracy.

Strengths

Limitations

Elite Coding Proficiency: Currently leads open-weight models with a 73.8% SWE-bench score, outperforming many proprietary competitors.
Extreme Hardware Intensity: The 355B parameter count makes local hosting prohibitive for individual developers without multi-GPU setups.
Massive Output Tokens: Features a 131K output limit, allowing for the generation of massive, production-ready codebases in a single turn.
API vs Web Disparity: There is a noticeable performance gap between the instant API responses and the deeper reasoning found in the web interface.
Native Reasoning Engine: Incorporates 'Deep Thinking' capabilities that allow for better planning and reduced drift in long-running agentic tasks.
Temporal Hallucinations: Users have reported occasional inaccuracies regarding current dates and events immediately following the model's launch.
Unbeatable Cost-to-Performance: Provides frontier-level intelligence at a fraction of the cost, starting at just $0.60 per million input tokens.
High Reasoning Latency: Enabling the full Deep Thinking mode can significantly increase the response time for complex, multi-step prompts.

API Quick Start

zhipu/glm-4-7

View Documentation
zhipu SDK
import { ZhipuAI } from "zhipuai";

const client = new ZhipuAI({ apiKey: "YOUR_API_KEY" });

async function main() {
  const response = await client.chat.completions.create({
    model: "glm-4.7",
    messages: [{ role: "user", content: "Build a real-time collaborative whiteboard using Next.js." }],
    stream: true,
    extra_body: { "thinking": true }
  });

  for await (const chunk of response) {
    process.stdout.write(chunk.choices[0].delta.content || "");
  }
}

main();

Install the SDK and start making API calls in minutes.

What People Are Saying About GLM-4.7

See what the community thinks about GLM-4.7

"GLM 4.7 CRUSHES OPEN SOURCE RECORDS! ... hit 42.8% on Humanity's Last Exam"
MindColliers
x/twitter
"GLM-4.7... scores 73.8% on SWE-Bench at $0.6/M tokens... The AI race is becoming truly multipolar."
MateusGalasso
x/twitter
"GLM 4.7 brings clear gains... in multilingual agentic coding and terminal-based tasks"
Dear-Success-1441
reddit
"This model is crushing it on many 2025 coding benchmarks"
cloris_rust
reddit
"GLM 4.7 wins for speed and stability, while Minimax M2.1 dominates in multi-agent coding"
JamMasterJulian
youtube
"Zhipu is really showing what open weights can do against the big labs in the US."
DevGuru
hackernews

Videos About GLM-4.7

Watch tutorials, reviews, and discussions about GLM-4.7

GLM 4.7 is a model that delivers major improvements in code quality, complex reasoning, and tool usage

Scored a 73.8 percentage on SWE-bench verified, which is absolutely incredible for an open-source model

It even surpasses Claude Sonnet 4.5 and GPT 5.1 in tool use benchmarks

The mixture of experts approach here is very refined, leading to higher efficiency despite the size

It is essentially the first open-weight model to provide a viable alternative to Claude 3.5 for heavy coding

It is the best open model yet by a long shot

It produces cleaner, more modern web pages and generates better looking slides

Reasons, but the thinking traces are not available in the coding plan API

Vibe coding results are near-perfect, even with complex Tailwind animations

The 200k context handles long repos with very little needle loss compared to previous GLM versions

Important upgrade is thinking before acting, which helps the model handle complex tasks reliably

Highlight vibe coding, where GLM 4.7 improves UI quality

API pricing to be around the same $3, making it a very cost-effective option

The multimodal performance allows it to convert Figma designs to code with high accuracy

Local deployment is possible if you have a massive workstation, but the API is remarkably fast

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows
Watch demo video

Pro Tips

Expert tips to help you get the most out of this model and achieve better results.

Enable Deep Thinking

For complex logical tasks, explicitly trigger the thinking mode via API parameters to enable multi-step planning.

Leverage Preserved Thinking

Maintain long conversation histories to utilize the model's ability to retain reasoning traces across multiple turns.

Local Quantization

Use Unsloth-optimized 2-bit or 4-bit GGUF versions to run this high-parameter model on consumer-grade hardware.

Date Injection

Manually include the current date in the system prompt to avoid temporal hallucinations and improve scheduling accuracy.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.