- Z.ai released the GLM-5.2 AI model, which performs within 1% of Claude Opus 4.8 on key engineering benchmarks while beating GPT-5.5.
- The model was trained entirely on Huawei Ascend chips with no NVIDIA hardware, at an estimated cost of around $25 million.
- Available under an MIT license, it offers a 1-million-token context window and API pricing significantly cheaper than major competitors.
- Unsloth AI has released 2-bit quantizations, shrinking the model to 238GB, though it still requires 256GB of RAM or VRAM to run locally.
On June 16, the Beijing-based lab Z.ai dropped its powerful new GLM-5.2 AI model, which immediately challenged industry leaders. The company, on the U.S. Entity List since January 2025, has seen its stock surge 90% to a new all-time high following this release and the U.S. ban on Anthropic Fable.
Consequently, the model’s benchmark performance validates the hype. On the FrontierSWE test for multi-hour engineering projects, GLM-5.2 scored 74.4, just behind Claude Opus 4.8’s 75.1 and ahead of GPT-5.5’s 72.6.
Meanwhile, it also excelled on other metrics, scoring 62.1 on SWE-bench Pro and becoming the best open-source model in the Artificial Analysis Intelligence Index. OpenRouter’s benchmarks place it in the same category as the banned Claude Fable 5.
A key differentiator is its hardware independence, having been built entirely on Huawei Ascend chips. Stability AI founder Emad Mostaque estimated total training costs at around $25 million, which is extremely cheap compared to peers.
The technical leap includes a 744-billion-parameter architecture and a genuine 1-million-token context window, five times larger than its predecessor. This operational shift enables whole-repo navigation and multi-file refactors as single-call workflows for developers.
For access, API pricing is set at $1.40 per million input tokens and $4.40 per million output. The Coding Plan starts at around $18 a month and integrates with popular developer environments like Claude Code and Cline.
Local deployment is now technically possible thanks to work from Unsloth AI, which released 2-bit GGUF quantizations that compress the model from 1.51TB to 238GB. However, this still demands 256GB of unified memory or a matching RAM/VRAM combination.
In practical tests, the model generated a playable game with highly varied scenarios and enemy types, showcasing strong output diversity. This makes it economically compelling for multi-shot generation workflows where diversity matters more than polished interfaces.
The open-source weights are now live on HuggingFace under the permissive MIT license, with quantized versions also available. GLM Coding Plan subscribers can switch to the model immediately, and it’s available for free testing on z.AI with some usage constraints.
✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.
