Skip to content
AI-Daily-Builder

2026-06-07 views $AMD · AMD / UALink Consortium · UALink / UALink-over-Ethernet scale-up fabric (Instinct MI455X Helios)

UALink's Switch-Silicon Gap: Why AMD's First Helios Racks Ship Over Ethernet

AMD's Helios MI455X rack, detailed on June 4, runs its 72-GPU scale-up domain over UALink-over-Ethernet rather than native UALink switches, because switch ASICs from Astera Labs, Auradine, Enfabrica, XConn and Upscale AI are still in validation. The result is a live test of

The signal

AMD’s “Helios” rack-scale platform built around the Instinct MI455X surfaced in detail on June 4, 2026, and the interesting part is not the GPU. It is the wire between the GPUs. The rack stitches together 72 MI455X accelerators into a single scale-up domain with roughly 260 TB/s of aggregate scale-up bandwidth, 31 TB of HBM4, and about 2,900 dense FP4 PFLOPS, fed by up to 256-core EPYC “Venice” CPUs and 43 TB/s of scale-out networking through Pensando NICs. But the first systems do not run AMD’s headline scale-up protocol natively. They run UALink-over-Ethernet (UALoE), a transport that carries UALink semantics inside standard Ethernet frames, because native UALink switch chips are, in the reporting’s words, “pending validation and qualification” by AMD’s customers.

That single substitution is the whole infra-IP story right now. UALink is the open, AMD-and-allies answer to Nvidia’s NVLink: a memory-semantic, low-latency fabric meant to connect up to 1,024 accelerators in one pod. The specification side has raced ahead. The silicon that makes it real has not.

Spec ahead of silicon

The UALink Consortium ratified a second wave of specifications on April 7, 2026, publishing the 200G 1.0 Data Link and Physical Layer plus additions for in-network compute (to cut inter-GPU message traffic), a chiplet definition (to embed UALink inside an SoC), and a 1.0 manageability spec (gRPC, YANG, SAI, Redfish). Notably, the 2.0 common specification landed before any 1.0 silicon shipped. The consortium chair was candid that versions 1.0 and 2.0 “won’t be full competitors to Nvidia,” with parity targeted only at version 3.0, expected roughly a year out.

The hardware calendar is the constraint. Per consortium guidance, 1.0 silicon reaches labs in the second half of 2026, appears in 2027, and lands in products later that year. Practical adoption depends on a short list of merchant switch vendors — Astera Labs, Auradine, Enfabrica, and XConn (now inside Marvell after its roughly $540M acquisition) — and on startups like Upscale AI, whose “SkyHammer” scale-up fabric ASIC (backed by a $200M Series A announced January 21, 2026) is slated for sample shipments at the end of 2026 and volume in 2027. Until those parts qualify, a UALink-native switched rack has no switch to put in it.

Why Ethernet is the stopgap

This is where Ethernet walks in. The case for carrying scale-up traffic over Ethernet is simply that it exists, ships in volume, and shares one operational toolchain for monitoring, telemetry, and debug across scale-up and scale-out. Broadcom has pushed this line aggressively with its Tomahawk Ultra positioning — a 51.2 Tb/s switch claiming roughly 250 ns latency and support for 1,024-plus accelerators over “scale-up Ethernet” — and has argued you should not wait on “some spec that’s under development that maybe you’ll have a chip a couple of years from now.”

The counter-argument is equally concrete. Ethernet was designed as general-purpose networking, not as an accelerator memory fabric, so UALoE can carry higher latency, more protocol overhead, and less deterministic behavior than a purpose-built switched UALink fabric. For training and large-context inference, where collective operations are sensitive to tail latency, “less deterministic” is not a footnote — it is throughput left on the floor. AMD’s first Helios systems are, in effect, a real-world A/B test: ship over Ethernet now, swap to native UALink switches when they qualify, and let customers measure the delta.

ItemDetail
PlatformAMD “Helios” rack, 72x Instinct MI455X
Scale-up transport (initial)UALink-over-Ethernet (UALoE)
Aggregate scale-up BWabout 260 TB/s
HBM4 per rack31 TB
Dense FP4about 2,900 PFLOPS
Native UALink switch ETAlabs H2 2026, products 2027
Merchant switch vendorsAstera Labs, Auradine, Enfabrica, XConn (Marvell), Upscale AI

Practitioner note

If you are sizing a 2026-2027 accelerator buildout, treat “UALink-capable” and “UALink-switched” as two different purchase decisions. A platform can be UALink-capable at the accelerator endpoint while its first shipping fabric is Ethernet-based; the native switched configuration may be a later SKU gated on third-party silicon qualification. Ask vendors three questions: which switch ASIC and stepping the native config depends on, the qualification window, and whether collective-latency benchmarks were run on UALoE or on a native UALink switch — because the headline aggregate-bandwidth number does not tell you the tail-latency story that governs real training and inference performance.

Under-considered angle

The market keeps framing this as UALink versus Ethernet, but the more durable outcome may be UALink-over-Ethernet as a permanent tier rather than a placeholder. If UALoE on a 51.2T-class switch lands “close enough” on latency for a meaningful share of inference and mid-scale training, the economic gravity of reusing one switching technology, one optics supply chain, and one operations stack across both scale-up and scale-out is hard to overcome. In that world, native UALink switch silicon does not lose so much as it gets pushed to the highest-end training pods where determinism is non-negotiable — a far smaller TAM than the merchant switch startups are currently raising against. The risk for the interconnect-IP names is not that UALink fails; it is that “good-enough Ethernet” quietly caps how much of the scale-up socket the dedicated fabric ever gets to address.


Sources

Tip