Skip to main content
How I AI

GLM 5.2: why I’m replacing Opus in Claude Code with this new model

27 min episode · 2 min read
·
This New Model

Episode

27 min

Read time

2 min

Topics

Productivity, Fundraising & VC, Design & UX

AI-Generated Summary

Key Takeaways

  • Model Setup via OpenRouter: To run GLM 5.2 in Cursor, add your OpenRouter API key to the OpenAI key field, override the base URL with `openrouter.ai/api/v1/cursor` (the `/cursor` suffix is undocumented but required), then add `z-ai/glm-5.2` as a custom model. Claude Code requires editing `~/.zshrc` and `~/.claude/settings.json` to reroute all model calls.
  • Cost Efficiency at Scale: A 45-minute autonomous coding session consuming roughly 6 million tokens cost $3.36 on OpenRouter, with a 72% cache hit rate. Comparable tasks using Claude Opus 4.8 or GPT-4.5 would cost significantly more. For high-volume coding workflows, switching to GLM 5.2 via a third-party inference provider can reduce API spend substantially.
  • Benchmark Positioning: On SWE-Bench Pro, GLM 5.2 scores near GPT-4.5 and approaches Claude Opus 4.8, while outperforming Gemini 2.1 Pro. This places it firmly in frontier-model territory for coding tasks, making it a credible drop-in replacement for expensive proprietary models in agentic software engineering pipelines.
  • Agentic Task Performance: GLM 5.2 successfully ran a 45-minute autonomous session pulling 72 hours of Sentry errors and Vercel logs, generating a prioritized bug-fix plan with 14 fixes, 2 P0 issues, and suggested sequencing. It struggled with React/TypeScript mid-session but self-corrected, indicating it handles long-horizon tasks with occasional intervention needed.
  • Model Constraints to Know: GLM 5.2 is text-only — no image input or output — which limits multimodal workflows. It supports a 1-million-token context window, function calling, MCP tool use, structured output, streaming, and reasoning/thinking mode. For pure coding and text-based agentic tasks, these constraints rarely surface as blockers in practice.

What It Covers

GLM 5.2, an open-weight model from Beijing-based Z.ai, is tested as a replacement for Claude Opus 4.8 inside Claude Code and Cursor. The episode benchmarks its coding, design, and autonomous agent capabilities against frontier models, with total API costs tracked at $3.36 for 6 million tokens via OpenRouter.

Key Questions Answered

  • Model Setup via OpenRouter: To run GLM 5.2 in Cursor, add your OpenRouter API key to the OpenAI key field, override the base URL with `openrouter.ai/api/v1/cursor` (the `/cursor` suffix is undocumented but required), then add `z-ai/glm-5.2` as a custom model. Claude Code requires editing `~/.zshrc` and `~/.claude/settings.json` to reroute all model calls.
  • Cost Efficiency at Scale: A 45-minute autonomous coding session consuming roughly 6 million tokens cost $3.36 on OpenRouter, with a 72% cache hit rate. Comparable tasks using Claude Opus 4.8 or GPT-4.5 would cost significantly more. For high-volume coding workflows, switching to GLM 5.2 via a third-party inference provider can reduce API spend substantially.
  • Benchmark Positioning: On SWE-Bench Pro, GLM 5.2 scores near GPT-4.5 and approaches Claude Opus 4.8, while outperforming Gemini 2.1 Pro. This places it firmly in frontier-model territory for coding tasks, making it a credible drop-in replacement for expensive proprietary models in agentic software engineering pipelines.
  • Agentic Task Performance: GLM 5.2 successfully ran a 45-minute autonomous session pulling 72 hours of Sentry errors and Vercel logs, generating a prioritized bug-fix plan with 14 fixes, 2 P0 issues, and suggested sequencing. It struggled with React/TypeScript mid-session but self-corrected, indicating it handles long-horizon tasks with occasional intervention needed.
  • Model Constraints to Know: GLM 5.2 is text-only — no image input or output — which limits multimodal workflows. It supports a 1-million-token context window, function calling, MCP tool use, structured output, streaming, and reasoning/thinking mode. For pure coding and text-based agentic tasks, these constraints rarely surface as blockers in practice.

Notable Moment

During a live autonomous run, the model stalled repeatedly while writing React and TypeScript, prompting frustration. Then, without intervention beyond a verbal complaint to the recording, it recovered, compiled cleanly, and delivered a well-structured dark-mode bug prioritization dashboard — reversing the apparent failure entirely.

Know someone who'd find this useful?

You just read a 3-minute summary of a 24-minute episode.

Get How I AI summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from How I AI

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into How I AI.

Every Monday, we deliver AI summaries of the latest episodes from How I AI and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime