Skip to content
AI-Daily-Builder

2026-05-20 views

Alibaba T-Head Zhenwu M890 — 144GB domestic AI accelerator, 3x prior gen

Read this because The number that matters is 560,000 units already shipped — this is not a paper launch. China's domestic accelerator stack is at volume, and the M890's agent-workload tuning shows the decoupling is now targeting the same workloads NVIDIA sells into.

Alibaba T-Head unveiled the Zhenwu M890: 144GB memory, 800GB/s interchip, 3x the 810E. 560K Zhenwu units shipped to 400+ customers. V900 in 2027.

Alibaba’s chip subsidiary T-Head unveiled the Zhenwu M890 AI accelerator at an event in Hangzhou (May 19-20). The spec sheet is competitive — but the number that actually matters is buried lower: 560,000 Zhenwu units already shipped to 400+ customers across 20 industries. This is a volume program, not a paper launch.

Specs

MetricZhenwu M890
GPU memory144 GB
Interchip bandwidth800 GB/s
Performance vs Zhenwu 810E3x
Workload focusTraining and inference, tuned for agentic tasks
Companion modelQwen 3.7-Max (runs 35h continuous)

The roadmap

T-Head laid out a multi-year cadence:

A published multi-year roadmap is itself a signal: it tells Chinese hyperscalers and enterprises they can plan around a domestic supply line rather than gambling on NVIDIA export-license availability.

Why this matters

Three reads:

  1. Volume is real. 560K units shipped puts T-Head past the “demo” phase. The domestic Chinese accelerator market — Huawei Ascend, Cambricon, and now T-Head Zhenwu at scale — is a genuine second supply ecosystem, not aspirational.
  2. Agent-workload tuning is the tell. The M890 is explicitly tuned for agentic tasks + paired with a model (Qwen 3.7-Max) that runs 35 hours continuous. China’s stack is now targeting the same high-value workloads NVIDIA sells into — not just cheaper inference.
  3. 144GB is HBM-class memory. That capacity competes with high-end Western accelerators on the memory-bound workloads (large-context inference, agent state) that increasingly define AI economics.

Practitioner note

The under-considered angle: the decoupling narrative usually focuses on training, but the M890 is tuned for agents + inference — the workloads that scale with deployment, not research. If China’s domestic stack is competitive on inference economics, the long-run substitution is structurally larger than the training-chip headlines suggest, because inference is where the volume lives.


Sources

Tags

Tip