zhipu

GLM-5

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

Open WeightsAgentic EngineeringMoEZhipu AICoding AI
zhipu logozhipuGLMFebruary 11, 2026
Context
200Ktokens
Max Output
128Ktokens
Input Price
$1.00/ 1M
Output Price
$3.20/ 1M
Modality:Text
Capabilities:ToolsStreamingReasoning
Benchmarks
GPQA
68.2%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). GLM-5 scored 68.2% on this benchmark.
HLE
32%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. GLM-5 scored 32% on this benchmark.
MMLU
85%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. GLM-5 scored 85% on this benchmark.
MMLU Pro
70.4%
MMLU Pro: MMLU Professional Edition. An enhanced version of MMLU with 12,032 questions using a harder 10-option multiple choice format. Covers Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science. GLM-5 scored 70.4% on this benchmark.
SimpleQA
48%
SimpleQA: Factual Accuracy Benchmark. Tests a model's ability to provide accurate, factual responses to straightforward questions. Measures reliability and reduces hallucinations in knowledge retrieval tasks. GLM-5 scored 48% on this benchmark.
IFEval
88%
IFEval: Instruction Following Evaluation. Measures how well a model follows specific instructions and constraints. Tests the ability to adhere to formatting rules, length limits, and other explicit requirements. GLM-5 scored 88% on this benchmark.
AIME 2025
84%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. GLM-5 scored 84% on this benchmark.
MATH
88%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. GLM-5 scored 88% on this benchmark.
GSM8k
97%
GSM8k: Grade School Math 8K. 8,500 grade school-level math word problems requiring multi-step reasoning. Tests basic arithmetic and logical thinking through real-world scenarios like shopping or time calculations. GLM-5 scored 97% on this benchmark.
MGSM
90%
MGSM: Multilingual Grade School Math. The GSM8k benchmark translated into 10 languages including Spanish, French, German, Russian, Chinese, and Japanese. Tests mathematical reasoning across different languages. GLM-5 scored 90% on this benchmark.
MathVista
0%
MathVista: Mathematical Visual Reasoning. Tests the ability to solve math problems that involve visual elements like charts, graphs, geometry diagrams, and scientific figures. Combines visual understanding with mathematical reasoning. GLM-5 scored 0% on this benchmark.
SWE-Bench
77.8%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. GLM-5 scored 77.8% on this benchmark.
HumanEval
90%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. GLM-5 scored 90% on this benchmark.
LiveCodeBench
52%
LiveCodeBench: Live Coding Benchmark. Tests coding abilities on continuously updated, real-world programming challenges. Unlike static benchmarks, uses fresh problems to prevent data contamination and measure true coding skills. GLM-5 scored 52% on this benchmark.
MMMU
0%
MMMU: Multimodal Understanding. Massive Multi-discipline Multimodal Understanding benchmark testing vision-language models on college-level problems across 30 subjects requiring both image understanding and expert knowledge. GLM-5 scored 0% on this benchmark.
MMMU Pro
0%
MMMU Pro: MMMU Professional Edition. Enhanced version of MMMU with more challenging questions and stricter evaluation. Tests advanced multimodal reasoning at professional and expert levels. GLM-5 scored 0% on this benchmark.
ChartQA
0%
ChartQA: Chart Question Answering. Tests the ability to understand and reason about information presented in charts and graphs. Requires extracting data, comparing values, and performing calculations from visual data representations. GLM-5 scored 0% on this benchmark.
DocVQA
0%
DocVQA: Document Visual Q&A. Document Visual Question Answering benchmark testing the ability to extract and reason about information from document images including forms, reports, and scanned text. GLM-5 scored 0% on this benchmark.
Terminal-Bench
56.2%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. GLM-5 scored 56.2% on this benchmark.
ARC-AGI
12%
ARC-AGI: Abstraction & Reasoning. Abstraction and Reasoning Corpus for AGI - tests fluid intelligence through novel pattern recognition puzzles. Each task requires discovering the underlying rule from examples, measuring general reasoning ability rather than memorization. GLM-5 scored 12% on this benchmark.

About GLM-5

Learn about GLM-5's capabilities, features, and how it can help you achieve better results.

GLM-5 is Zhipu AI's next-generation flagship foundation model, specifically engineered to redefine the state of Agentic Engineering for open-weight systems. Built on a massive 744 billion parameter Mixture of Experts (MoE) architecture with 40 billion active parameters, it is the first open-weights model to bridge the performance gap with proprietary giants like Claude 4.5. This model excels in logic density and software engineering, achieving a breakthrough 77.8% on SWE-Bench Verified.

Technically, GLM-5 integrates advanced Multi-head Latent Attention (MLA) and Sparse Attention mechanisms to optimize token efficiency and reduce memory overhead by 33%. Trained on a scale of 28.5 trillion tokens using a purely domestic cluster of 100,000 Huawei Ascend chips, GLM-5 demonstrates that frontier-level reasoning is possible without dependency on high-end NVIDIA hardware. With its 200,000 token context window and specialized 'Thinking Mode,' it provides robust, low-hallucination outputs for high-precision technical workflows.

Optimized for reliability, GLM-5 serves as a foundation for autonomous technical agents capable of maintaining persistent state across long-horizon executions. Its permissive MIT licensing and competitive pricing of $1.00 per million input tokens make it an ideal choice for enterprises seeking local deployment or high-scale API integration without the restrictive terms of proprietary alternatives.

GLM-5

Use Cases for GLM-5

Discover the different ways you can use GLM-5 to achieve great results.

Complex Systems Engineering

Designing and maintaining microservice architectures with autonomous dependency management.

Long-Horizon Agentic Tasks

Executing multi-step technical workflows that require persistent memory for over an hour of execution.

Legacy Codebase Migration

Refactoring entire repositories and updating outdated dependencies across a 200k token window.

Low-Hallucination Technical Research

Conducting high-precision technical research where factual accuracy and abstention are paramount.

Autonomous Terminal Operations

Powering dev-agents that can autonomously run security audits and system administration commands.

Bilingual Global Deployment

Providing top-tier English and Chinese reasoning for localized enterprise applications at scale.

Strengths

Limitations

Elite Agentic Intelligence: Achieves the highest Agentic Index score (63) among open-weight models for multi-step task execution.
No Native Multimodality: Lacks the vision, audio, and video processing capabilities found in multimodal competitors like GPT-4o.
Low Hallucination Rate: Exhibits a 56% reduction in hallucinations compared to previous generations, prioritizing factual accuracy.
Extreme Hosting Requirements: The 1.5TB BF16 weights make local deployment impossible for almost all users without cloud infrastructure.
Massive MoE Efficiency: The 744B parameter architecture provides flagship logic density while MLA reduces memory overhead by 33%.
High Inference Latency: Initial time-to-first-token can be high (over 7 seconds) on public APIs compared to smaller 'flash' models.
Permissive MIT License: Released under a true open-source license, allowing for unrestricted commercial use without restrictive user-carve outs.
Frontend Design Nuance: While excellent at logic, it can occasionally struggle with fine-grained CSS aesthetic polishing compared to Claude.

API Quick Start

zai/glm-5

View Documentation
zhipu SDK
import { ZhipuAI } from "zhipuai-sdk";

const client = new ZhipuAI({ apiKey: "YOUR_API_KEY" });

async function main() {
  const response = await client.chat.completions.create({
    model: "glm-5",
    messages: [{ role: "user", content: "Analyze this repo for security vulnerabilities." }],
    stream: true,
  });

  for await (const chunk of response) {
    process.stdout.write(chunk.choices[0].delta.content || "");
  }
}

main();

Install the SDK and start making API calls in minutes.

What People Are Saying About GLM-5

See what the community thinks about GLM-5

"GLM-5 is the new open weights leader! It scores 50 on the Intelligence Index, a significant closing of the gap."
Artificial Analysis
x
"This model is unbelievable. I successfully ran a job that took over an hour... blew me away."
Theo - t3.gg
youtube
"GLM-5 used zero NVIDIA chips, 745B params, and costs $1 per million input tokens. This is the future."
Legendary
x
"The hallucination rate is insane; it's much more willing to say 'I don't know' than lie to you."
DevUser456
reddit
"Zhipu AI just dropped the gauntlet for open source coding models."
AIExplorer
hackernews
"Finally, an open weight model that doesn't lose its mind halfway through a complex task."
CodeMaster
reddit

Videos About GLM-5

Watch tutorials, reviews, and discussions about GLM-5

It is by far the best openweight model I have seen, especially for code stuff.

The fact this is the first openweight model that I've successfully run a job that took over an hour... blew me away.

It appears to be the model that hallucinates the least of any model to date.

We are seeing a massive shift in what open weight models can actually do in production.

The stability of this model during long tool-use sessions is genuinely unprecedented.

The coding feel here is very, very potent... comparable to GLM 4.7 which was already a unicorn.

The introduction of the dynamic island in its UI mockup was a very cool, unexpected special feature.

It's outperforming almost every other model in its class for complex logic.

The reasoning depth here reminds me of the first time I used o1, but it's open weight.

For a text-only model, it handles visual logic in code better than many vision models.

Memory usage has shot down... we got 33x memory improvements compared to what we were doing previously.

It passed the car wash logic test with thinking enabled, beating out Claude and GPT-4o.

Deploying this requires a serious server rack, but the performance per watt is insane.

It handled my legacy repo migration without a single hallucinated library name.

The thinking mode isn't just a gimmick; it fundamentally changes the output quality.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips for GLM-5

Expert tips to help you get the most out of GLM-5 and achieve better results.

Enable Thinking Mode

GLM-5 performs significantly better on complex logic puzzles like the 'car wash' test when reasoning is enabled.

Leverage the MIT License

Take advantage of the permissive licensing for unrestricted commercial development and internal hosting.

Tool Use Optimization

Use GLM-5 for multi-step tasks as it is specifically purpose-built for high stability in agentic tool execution.

Context Window Utilization

Input entire codebases into the 200k window to perform repository-wide security audits or refactoring.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

minimax

MiniMax M2.5

minimax

MiniMax M2.5 is a SOTA MoE model featuring a 1M context window and elite agentic coding capabilities at disruptive pricing for autonomous agents.

1M context
$0.30/$1.20/1M
openai

GPT-5.3 Codex

OpenAI

GPT-5.3 Codex is OpenAI's 2026 frontier coding agent, featuring a 400K context window, 77.3% Terminal-Bench score, and superior logic for complex software...

400K context
$1.75/$14.00/1M
alibaba

Qwen3-Coder-Next

alibaba

Qwen3-Coder-Next is Alibaba Cloud's elite Apache 2.0 coding model, featuring an 80B MoE architecture and 256k context window for advanced local development.

256K context
$0.14/$0.42/1M
anthropic

Claude Sonnet 4.5

Anthropic

Anthropic's Claude Sonnet 4.5 delivers world-leading coding (77.2% SWE-bench) and a 200K context window, optimized for the next generation of autonomous agents.

200K context
$3.00/$15.00/1M
alibaba

Qwen-Image-2.0

alibaba

Qwen-Image-2.0 is Alibaba's unified 7B model for professional infographics, photorealism, and precise image editing with native 2K resolution and 1k-token...

1K context
$0.07/1M
anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship model featuring a 1M token context window, Adaptive Thinking, and world-class coding and reasoning performance.

200K context
$5.00/$25.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

262K context
$0.60/$2.50/1M
deepseek

DeepSeek-V3.2-Speciale

DeepSeek

DeepSeek-V3.2-Speciale is a reasoning-first LLM featuring gold-medal math performance, DeepSeek Sparse Attention, and a 131K context window. Rivaling GPT-5...

131K context
$0.28/$0.42/1M

Frequently Asked Questions About GLM-5

Find answers to common questions about GLM-5