zhipu

GLM-5

GLM-5 is Zhipu AI's 744B parameter open-weight powerhouse, excelling in long-horizon agentic tasks, coding, and factual accuracy with a 200k context window.

Open WeightsAgentic EngineeringMoEZhipu AICoding AI
zhipu logozhipuGLMFebruary 11, 2026
Context
200Ktokens
Max Output
128Ktokens
Input Price
$1.00/ 1M
Output Price
$3.20/ 1M
Modality:Text
Capabilities:ToolsStreamingReasoning
Benchmarks
GPQA
86%
GPQA: Graduate-Level Science Q&A. A rigorous benchmark with 448 multiple-choice questions in biology, physics, and chemistry created by domain experts. PhD experts only achieve 65-74% accuracy, while non-experts score just 34% even with unlimited web access (hence 'Google-proof'). GLM-5 scored 86% on this benchmark.
HLE
30.5%
HLE: High-Level Expertise Reasoning. Tests a model's ability to demonstrate expert-level reasoning across specialized domains. Evaluates deep understanding of complex topics that require professional-level knowledge. GLM-5 scored 30.5% on this benchmark.
MMLU
88%
MMLU: Massive Multitask Language Understanding. A comprehensive benchmark with 16,000 multiple-choice questions across 57 academic subjects including math, philosophy, law, and medicine. Tests broad knowledge and reasoning capabilities. GLM-5 scored 88% on this benchmark.
AIME 2025
84%
AIME 2025: American Invitational Math Exam. Competition-level mathematics problems from the prestigious AIME exam designed for talented high school students. Tests advanced mathematical problem-solving requiring abstract reasoning, not just pattern matching. GLM-5 scored 84% on this benchmark.
MATH
97.4%
MATH: Mathematical Problem Solving. A comprehensive math benchmark testing problem-solving across algebra, geometry, calculus, and other mathematical domains. Requires multi-step reasoning and formal mathematical knowledge. GLM-5 scored 97.4% on this benchmark.
SWE-Bench
77.8%
SWE-Bench: Software Engineering Benchmark. AI models attempt to resolve real GitHub issues in open-source Python projects with human verification. Tests practical software engineering skills on production codebases. Top models went from 4.4% in 2023 to over 70% in 2024. GLM-5 scored 77.8% on this benchmark.
HumanEval
97.0%
HumanEval: Python Programming Problems. 164 hand-written programming problems where models must generate correct Python function implementations. Each solution is verified against unit tests. Top models now achieve 90%+ accuracy. GLM-5 scored 97.0% on this benchmark.
Terminal-Bench
56.2%
Terminal-Bench: Terminal/CLI Tasks. Tests the ability to perform command-line operations, write shell scripts, and navigate terminal environments. Measures practical system administration and development workflow skills. GLM-5 scored 56.2% on this benchmark.

About GLM-5

Learn about GLM-5's capabilities, features, and how it can help you achieve better results.

GLM-5 is Zhipu AI's flagship foundation model designed for autonomous agentic workflows and complex systems engineering. It utilizes a massive 744 billion parameter Mixture-of-Experts (MoE) architecture, with 40 billion parameters active during inference to balance performance and speed. The model is the first open-weight system to demonstrate parity with proprietary frontier models in software engineering tasks, scoring 77.8% on SWE-bench Verified.

The model was trained on 28.5 trillion tokens using a domestic cluster of 100,000 Huawei Ascend chips. It integrates specialized mechanisms like Multi-head Latent Attention (MLA) and DeepSeek Sparse Attention (DSA) to maintain logical consistency across its 200,000 token context window. This technical stack allows GLM-5 to handle long-horizon planning and resource management without the high latency typical of dense models of this size.

Zhipu AI released GLM-5 under the MIT license, enabling enterprise users to deploy the weights locally for sensitive data processing. With an input cost of just $1.00 per million tokens, it offers a 6x price advantage over rival models like Claude 4.5. The model includes a dedicated Thinking Mode that reduces hallucination rates significantly compared to its predecessors.

GLM-5

Use Cases

Discover the different ways you can use GLM-5 to achieve great results.

Autonomous Software Engineering

Solving complex GitHub issues and performing repo-wide refactors by utilizing its 77.8% score on SWE-bench Verified.

Enterprise Tool Orchestration

Executing multi-step agentic workflows across internal APIs to handle back-office automation in finance and legal sectors.

Long-Context Repository Analysis

Using the 200,000 token window to ingest and analyze entire documentation sets or multi-file codebases in a single pass.

Personal AI Coworkers

Powering open-source agents like OpenClaw to manage emails, calendars, and background tasks 24/7 with high reliability.

On-Premise Private Intelligence

Deploying the open-weight model locally under its MIT license to ensure full data privacy for sensitive corporate operations.

Cost-Efficient Agent Scaling

Running high-volume agentic sessions at 6-8x lower costs compared to proprietary frontier models without sacrificing reasoning depth.

Strengths

Limitations

Elite Coding Performance: Achieves a 77.8% score on SWE-bench Verified, matching proprietary giants like Claude Opus for autonomous software engineering.
No Native Vision: The model lacks the ability to process images or vision directly, which limits its use in modern multimodal UI/UX workflows.
6x Price Advantage: Offers frontier-level reasoning at just $1.00 per 1M input tokens, making high-scale agentic deployments economically viable.
Terminal Task Lag: Performance on Terminal-Bench 2.0 sits at 56.2%, trailing slightly behind the absolute top-tier proprietary competitors.
MIT Licensed Weights: Full open-weight availability on Hugging Face allows for private local deployment on Huawei Ascend or NVIDIA hardware.
Hallucination Frequency: Early benchmarks show hallucination rates near 30% for specific complex reasoning tasks compared to lower rates in top rivals.
Massive Context Capacity: The 200K token window coupled with 128K output tokens is ideal for repository-wide analysis and long-form generations.
Hardware Variances: Training on Huawei Ascend hardware may lead to minor performance variances when deployed on standard NVIDIA-only software stacks.

API Quick Start

zai/glm-5

View Documentation
zhipu SDK
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ZHIPU_API_KEY,
  baseURL: "https://open.bigmodel.cn/api/paas/v4/",
});

const response = await client.chat.completions.create({
  model: "glm-5",
  messages: [{ role: "user", content: "Analyze this repo structure and refactor to GraphQL." }],
  stream: true,
});

for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Install the SDK and start making API calls in minutes.

Community Feedback

See what the community thinks about GLM-5

GLM-5 is an open-source 744B parameter model that performs near Claude Opus level on coding... but the price difference matters.
Odd-Coconut-2067
reddit
The 200,000 token window changes your workflow: Analyze 20+ files for a single refactor or review complex PR diffs in one pass.
AskCodi
reddit
I went from spending ~$90/month on Claude API calls to under $15 with GLM-5 and didn't notice a meaningful drop in quality.
IulianHI
reddit
Its hallucination rate is in the 30% range versus I don't know Gemini 3 Pro at 88%.
Sid
youtube
GLM-5 dropped before I could finish testing 4.7, and the reasoning jump is actually noticeable in everyday coding.
able_wong
twitter
Zhipu releasing this under MIT is a massive move for the local LLM community.
dev_tester
twitter

Related Videos

Watch tutorials, reviews, and discussions about GLM-5

It's neck and neck with models like 5.2 codecs and Opus 4.5.

It is the first openweight model that I've successfully run a job that took over an hour for without issues.

Its hallucination rate is in the 30% range versus I don't know Gemini 3 Pro at 88%.

The reasoning density is significantly higher than GLM-4.

It basically replaces Claude 3.5 Sonnet for my internal coding tasks.

They literally doubled the almost doubled the number of parameters... all the way up to 744.

Even though it's a lot larger, it runs at pretty much if not faster than the older model.

Self-correction. Don't be condescending. Treat it like a valid question.

The sparse attention mechanism keeps memory usage low for such a big model.

Open-weight availability makes this the new champion for local hosting.

They created their own RL engine called Slime.

A 200,000 context window changes what enterprise AI even means.

It hits 77.8 on SWE-bench verified, beating Gemini 3 Pro at 76.2.

Zhipu AI is proving domestic hardware can train world-class models.

Agentic engineering is the key focus here, not just simple chat.

More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips

Expert tips to help you get the most out of GLM-5 and achieve better results.

Activate Agentic Mode

Define multi-step plans in your prompts as GLM-5 is optimized for autonomous engineering rather than simple chat responses.

Local Hardware Allocation

Ensure significant VRAM or native Huawei Ascend hardware with the MindSpore framework is available for optimal throughput.

Implement Fallback Chains

Configure GLM-5 as your primary reasoning model with GLM-4.7-Flash as a cost-effective fallback for simpler instructions.

Use Structured Output

GLM-5 excels at generating precise .docx and .xlsx formats when given clear schema requirements for deliverables.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related AI Models

openai

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship model for professional tasks, featuring a 400K context window, elite coding, and deep multi-step reasoning capabilities.

400K context
$1.75/$14.00/1M
google

Gemini 3.1 Flash-Lite

Google

Gemini 3.1 Flash-Lite is Google's fastest, most cost-efficient model. Features 1M context, native multimodality, and 363 tokens/sec speed for scale.

1M context
$0.25/$1.50/1M
anthropic

Claude Opus 4.5

Anthropic

Claude Opus 4.5 is Anthropic's most powerful frontier model, delivering record-breaking 80.9% SWE-bench performance and advanced autonomous agency for coding.

200K context
$5.00/$25.00/1M
moonshot

Kimi K2 Thinking

Moonshot

Kimi K2 Thinking is Moonshot AI's trillion-parameter reasoning model. It outperforms GPT-5 on HLE and supports 300 sequential tool calls autonomously for...

256K context
$0.60/$2.50/1M
xai

Grok-4

xAI

Grok-4 by xAI is a frontier model featuring a 2M token context window, real-time X platform integration, and world-record reasoning capabilities.

2M context
$3.00/$15.00/1M
moonshot

Kimi K2.5

Moonshot

Discover Moonshot AI's Kimi K2.5, a 1T-parameter open-source agentic model featuring native multimodal capabilities, a 262K context window, and SOTA reasoning.

256K context
$0.60/$3.00/1M
openai

GPT-5.4

OpenAI

GPT-5.4 is OpenAI's frontier model featuring a 1.05M context window and Extreme Reasoning. It excels at autonomous UI interaction and long-form data analysis.

1M context
$2.50/$15.00/1M
openai

GPT-5.1

OpenAI

GPT-5.1 is OpenAI’s advanced reasoning flagship featuring adaptive thinking, native multimodality, and state-of-the-art performance in math and technical...

400K context
$1.25/$10.00/1M

Frequently Asked Questions

Find answers to common questions about GLM-5