2026-05-07
DeepSeek V4, Kimi K2.6, GLM-5.1, MiniMax M2.7: frontier-tier coding at 5–25× lower cost
Four Chinese labs shipped frontier coding models in 12 days. GLM-5.1 MIT license ties Kimi K2.6 on SWE-Bench Pro at 58.4%. Cost gap is 5–25× vs Western APIs.
In a 12-day window in late April 2026, four Chinese labs released open-weights coding models that reach approximately the same capability ceiling on agentic coding benchmarks — while costing 5–25× less than Western frontier API equivalents.
The four models
DeepSeek V4 Pro (DeepSeek AI) — 1.6 trillion total parameters, 49B active per forward pass, MoE architecture. Apache 2.0 license. Tops the general reasoning benchmarks among the group; slightly below the coding specialists on SWE-Bench.
Kimi K2.6 (Moonshot AI) — 80.2% SWE-Bench Verified, 58.6% SWE-Bench Pro. The highest agentic coding score in this cohort and competitive with Claude 3.7 Sonnet on code editing tasks. Commercially licensed; hosted API available.
GLM-5.1 (Z.AI / Zhipu) — 754B MoE, MIT license. 58.4% SWE-Bench Pro — statistically tied with K2.6. The MIT license is the headline: this is frontier-tier open-weights at genuinely permissive terms. Runs on 8×H100 for full precision; quantized versions run on consumer clusters.
MiniMax M2.7 (MiniMax) — 10B active parameters (from a larger MoE), 56.2% SWE-Bench Pro. The most compute-efficient entry: near-frontier coding quality at a fraction of the compute budget.
The cost gap
On hosted APIs, GLM-5.1 and DeepSeek V4 Flash are pricing at roughly $0.10–0.30 per million input tokens versus $3–15 for comparable Western frontier models. For inference-heavy workloads — code review, multi-file agent loops, parallel test generation — the economics shift materially.
What this means for the market
The 12-day synchronized window appears deliberate. Collectively, these releases put pressure on Western providers to cut prices at the coding-model tier. The pattern mirrors what happened with Qwen-2.5 and DeepSeek-V3 in late 2024: open-weights capability parity forced API price cuts within 60–90 days. Expect a similar repricing cycle in H2 2026.
Practitioner note
For production agentic coding workloads: benchmark GLM-5.1 against your actual task distribution before switching — SWE-Bench Verified is a standardized test, not your codebase. The MIT license matters most if you’re building a product that embeds a coding agent (no usage restrictions, no terms-of-service risk). For self-hosting on the DGX Spark or similar clusters, GLM-5.1’s quantized variants are worth evaluating now that the capability gap to hosted frontier models has closed this far.
Sources
- DeepSeek is back among leading open-weights models with V4 — Artificial Analysis ↗
- The late-April 2026 Chinese LLM stack — dev.to ↗
- Best Chinese LLM benchmarks — BenchLM ↗