What are the key takeaways from this How I AI episode?

Key insights include: **Model Setup via OpenRouter:** To run GLM 5.2 in Cursor, add your OpenRouter API key to the OpenAI key field, override the base URL with `openrouter.ai/api/v1/cursor` (the `/cursor` suffix is undocumented but required), then add `z-ai/glm-5.2` as a custom model. Claude Code requires editing `~/.zshrc` and `~/.claude/settings.json` to reroute all model calls.; **Cost Efficiency at Scale:** A 45-minute autonomous coding session consuming roughly 6 million tokens cost $3.36 on OpenRouter, with a 72% cache hit rate. Comparable tasks using Claude Opus 4.8 or GPT-4.5 would cost significantly more. For high-volume coding workflows, switching to GLM 5.2 via a third-party inference provider can reduce API spend substantially.; **Benchmark Positioning:** On SWE-Bench Pro, GLM 5.2 scores near GPT-4.5 and approaches Claude Opus 4.8, while outperforming Gemini 2.1 Pro. This places it firmly in frontier-model territory for coding tasks, making it a credible drop-in replacement for expensive proprietary models in agentic software engineering pipelines.

What did This New Model discuss on How I AI?

GLM 5.2, an open-weight model from Beijing-based Z.ai, is tested as a replacement for Claude Opus 4.8 inside Claude Code and Cursor. The episode benchmarks its coding, design, and autonomous agent capabilities against frontier models, with total API costs tracked at $3.36 for 6 million tokens via OpenRouter. Key topics include: **Model Setup via OpenRouter:** To run GLM 5.2 in Cursor, add your OpenRouter API key to the OpenAI key field, override the base URL with `openrouter.ai/api/v1/cursor` (the `/cursor` suffix is undocumented but required), then add `z-ai/glm-5.2` as a custom model. Claude Code requires editing `~/.zshrc` and `~/.claude/settings.json` to reroute all model calls.; **Cost Efficiency at Scale:** A 45-minute autonomous coding session consuming roughly 6 million tokens cost $3.36 on OpenRouter, with a 72% cache hit rate. Comparable tasks using Claude Opus 4.8 or GPT-4.5 would cost significantly more. For high-volume coding workflows, switching to GLM 5.2 via a third-party inference provider can reduce API spend substantially..

How long is this episode of How I AI?

This episode is 27 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

How I AI

GLM 5.2: why I’m replacing Opus in Claude Code with this new model

June 24, 2026

27 min episode · 2 min read

This New Model

Episode

27 min

Read time

2 min

Topics

Productivity, Fundraising & VC, Design & UX

AI-Generated Summary

Published Jun 24, 2026

Key Takeaways

✓Model Setup via OpenRouter: To run GLM 5.2 in Cursor, add your OpenRouter API key to the OpenAI key field, override the base URL with `openrouter.ai/api/v1/cursor` (the `/cursor` suffix is undocumented but required), then add `z-ai/glm-5.2` as a custom model. Claude Code requires editing `~/.zshrc` and `~/.claude/settings.json` to reroute all model calls.
✓Cost Efficiency at Scale: A 45-minute autonomous coding session consuming roughly 6 million tokens cost $3.36 on OpenRouter, with a 72% cache hit rate. Comparable tasks using Claude Opus 4.8 or GPT-4.5 would cost significantly more. For high-volume coding workflows, switching to GLM 5.2 via a third-party inference provider can reduce API spend substantially.
✓Benchmark Positioning: On SWE-Bench Pro, GLM 5.2 scores near GPT-4.5 and approaches Claude Opus 4.8, while outperforming Gemini 2.1 Pro. This places it firmly in frontier-model territory for coding tasks, making it a credible drop-in replacement for expensive proprietary models in agentic software engineering pipelines.
✓Agentic Task Performance: GLM 5.2 successfully ran a 45-minute autonomous session pulling 72 hours of Sentry errors and Vercel logs, generating a prioritized bug-fix plan with 14 fixes, 2 P0 issues, and suggested sequencing. It struggled with React/TypeScript mid-session but self-corrected, indicating it handles long-horizon tasks with occasional intervention needed.
✓Model Constraints to Know: GLM 5.2 is text-only — no image input or output — which limits multimodal workflows. It supports a 1-million-token context window, function calling, MCP tool use, structured output, streaming, and reasoning/thinking mode. For pure coding and text-based agentic tasks, these constraints rarely surface as blockers in practice.

What It Covers

GLM 5.2, an open-weight model from Beijing-based Z.ai, is tested as a replacement for Claude Opus 4.8 inside Claude Code and Cursor. The episode benchmarks its coding, design, and autonomous agent capabilities against frontier models, with total API costs tracked at $3.36 for 6 million tokens via OpenRouter.

Key Questions Answered

•Model Setup via OpenRouter: To run GLM 5.2 in Cursor, add your OpenRouter API key to the OpenAI key field, override the base URL with `openrouter.ai/api/v1/cursor` (the `/cursor` suffix is undocumented but required), then add `z-ai/glm-5.2` as a custom model. Claude Code requires editing `~/.zshrc` and `~/.claude/settings.json` to reroute all model calls.
•Cost Efficiency at Scale: A 45-minute autonomous coding session consuming roughly 6 million tokens cost $3.36 on OpenRouter, with a 72% cache hit rate. Comparable tasks using Claude Opus 4.8 or GPT-4.5 would cost significantly more. For high-volume coding workflows, switching to GLM 5.2 via a third-party inference provider can reduce API spend substantially.
•Benchmark Positioning: On SWE-Bench Pro, GLM 5.2 scores near GPT-4.5 and approaches Claude Opus 4.8, while outperforming Gemini 2.1 Pro. This places it firmly in frontier-model territory for coding tasks, making it a credible drop-in replacement for expensive proprietary models in agentic software engineering pipelines.
•Agentic Task Performance: GLM 5.2 successfully ran a 45-minute autonomous session pulling 72 hours of Sentry errors and Vercel logs, generating a prioritized bug-fix plan with 14 fixes, 2 P0 issues, and suggested sequencing. It struggled with React/TypeScript mid-session but self-corrected, indicating it handles long-horizon tasks with occasional intervention needed.
•Model Constraints to Know: GLM 5.2 is text-only — no image input or output — which limits multimodal workflows. It supports a 1-million-token context window, function calling, MCP tool use, structured output, streaming, and reasoning/thinking mode. For pure coding and text-based agentic tasks, these constraints rarely surface as blockers in practice.

Notable Moment

During a live autonomous run, the model stalled repeatedly while writing React and TypeScript, prompting frustration. Then, without intervention beyond a verbal complaint to the recording, it recovered, compiled cleanly, and delivered a well-structured dark-mode bug prioritization dashboard — reversing the apparent failure entirely.

Know someone who'd find this useful?

You just read a 3-minute summary of a 24-minute episode.

Get How I AI summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Similar Episodes

Related episodes from other podcasts

The AI Breakdown

Jun 22

Explore Related Topics

⚡Productivity 💰Fundraising & VC 🎨Design & UX

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into How I AI.

Every Monday, we deliver AI summaries of the latest episodes from How I AI and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

GLM 5.2: why I’m replacing Opus in Claude Code with this new model

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

How Claude Mythos found a 15-year-old bug in Mozilla Firefox | Brian Grinstead

Why AI Users Are Raving About GLM 5.2

How to design AI agent loops: schedules, goals, and subagents in Claude Code and Codex

OpenAI Misses Targets, Codex vs Claude, Elon vs Sam Trial, Big Hyperscaler Beats, Peptide Craze

More from How I AI

How Claude Mythos found a 15-year-old bug in Mozilla Firefox | Brian Grinstead

How to design AI agent loops: schedules, goals, and subagents in Claude Code and Codex

How Braintrust uses AI agents, evals, and CI to ship better software | Ankur Goyal

Claude Fable 5 review: what the new Mythos model gets right (and very wrong)

Shopping with Claude: How to find quality brands, automate returns, and buy things that last 100 years | Nicole Ruiz

Similar Episodes

Why AI Users Are Raving About GLM 5.2

OpenAI Misses Targets, Codex vs Claude, Elon vs Sam Trial, Big Hyperscaler Beats, Peptide Craze

Opus 4.6 Tops Benchmarks, ChatGPT Market Share Decline, and the Privacy Breakdown | EP 228

Claude Opus 4.6 vs GPT-5.3 Codex: Live Build, Clear Winner

Opus 4.6 and ChatGPT 5.3-Codex Are Here and the Labs Are at War

Explore Related Topics

You're clearly into How I AI.