Builder Daily

2026-05-07

DeepSeek V4, Kimi K2.6, GLM-5.1, MiniMax M2.7: frontier-tier coding at 5–25× lower cost

Four Chinese labs shipped frontier coding models in 12 days. GLM-5.1 MIT license ties Kimi K2.6 on SWE-Bench Pro at 58.4%. Cost gap is 5–25× vs Western APIs.

In a 12-day window in late April 2026, four Chinese labs released open-weights coding models that reach approximately the same capability ceiling on agentic coding benchmarks — while costing 5–25× less than Western frontier API equivalents.

The four models

DeepSeek V4 Pro (DeepSeek AI) — 1.6 trillion total parameters, 49B active per forward pass, MoE architecture. Apache 2.0 license. Tops the general reasoning benchmarks among the group; slightly below the coding specialists on SWE-Bench.

Kimi K2.6 (Moonshot AI) — 80.2% SWE-Bench Verified, 58.6% SWE-Bench Pro. The highest agentic coding score in this cohort and competitive with Claude 3.7 Sonnet on code editing tasks. Commercially licensed; hosted API available.

GLM-5.1 (Z.AI / Zhipu) — 754B MoE, MIT license. 58.4% SWE-Bench Pro — statistically tied with K2.6. The MIT license is the headline: this is frontier-tier open-weights at genuinely permissive terms. Runs on 8×H100 for full precision; quantized versions run on consumer clusters.

MiniMax M2.7 (MiniMax) — 10B active parameters (from a larger MoE), 56.2% SWE-Bench Pro. The most compute-efficient entry: near-frontier coding quality at a fraction of the compute budget.

The cost gap

On hosted APIs, GLM-5.1 and DeepSeek V4 Flash are pricing at roughly $0.10–0.30 per million input tokens versus $3–15 for comparable Western frontier models. For inference-heavy workloads — code review, multi-file agent loops, parallel test generation — the economics shift materially.

What this means for the market

The 12-day synchronized window appears deliberate. Collectively, these releases put pressure on Western providers to cut prices at the coding-model tier. The pattern mirrors what happened with Qwen-2.5 and DeepSeek-V3 in late 2024: open-weights capability parity forced API price cuts within 60–90 days. Expect a similar repricing cycle in H2 2026.

Practitioner note

For production agentic coding workloads: benchmark GLM-5.1 against your actual task distribution before switching — SWE-Bench Verified is a standardized test, not your codebase. The MIT license matters most if you’re building a product that embeds a coding agent (no usage restrictions, no terms-of-service risk). For self-hosting on the DGX Spark or similar clusters, GLM-5.1’s quantized variants are worth evaluating now that the capability gap to hosted frontier models has closed this far.


Sources

Tags

Tip