2026-06-17 — views

Compute for Physical AI — The Silicon Powering the Robot Ramp (Mid-2026)

Benchmarking the inference and training chips powering autonomous vehicles and humanoid robots — Jetson Thor, HW4, Dojo, EyeQ Ultra — through mid-2026.

The compute layer is the quiet constraint on physical AI timelines

The robot ramp has a silicon floor. Autonomous vehicles need onboard inference chips that fuse radar, lidar, and camera data in real time under strict power budgets. Humanoid robots need edge processors that run foundation models in a chassis weighing under 70 kilograms, unplugged. Training those models requires massive cloud or private compute clusters capable of ingesting terabytes of robot demonstration data.

Every timeline projection for Waymo, Tesla, Figure AI, or any other physical AI company is, at its base, a projection about available silicon. This article benchmarks the chips driving the field — inference hardware deployed in vehicles and robots, and the training compute behind the models.

One definitional note on TOPS: Tera-Operations Per Second (TOPS) is measured at INT8 precision for inference workloads. Training chips use different metrics (TFLOPS at BF16 or FP16). The two are not directly comparable — training and inference are distinct workloads. Tables below separate them accordingly.

Section 1 — Master silicon benchmark table

The table below covers the primary inference chips relevant to physical AI as of mid-2026. All TOPS figures are INT8 unless noted. Power figures are typical operating draw, not peak TDP unless specified. “Available” indicates general commercial availability; some chips remain on allocation or in staged rollout.

Chip	Maker	TOPS (INT8)	Power (W)	Memory / Bandwidth	Primary Use Case	Commercial Status
Jetson Orin NX	NVIDIA	100	10–25	16 GB LPDDR5, 102 GB/s	Edge robotics, drone payloads, industrial	Generally available
Jetson Thor	NVIDIA	800	~60	128 GB/s (est.)	Next-gen humanoid robots, advanced robotics	Staged / on allocation
HW4 (FSD Computer)	Tesla	1,000+ (Tesla claim)	~50–80 per unit	Custom LPDDR5	Tesla vehicle autonomy inference	In production (Model S/X/3/Y/Cybertruck/Cybercab)
HW4 Dual-Chip	Tesla	2,000+ (Tesla claim)	~100–160 combined	Two HW4 units in parallel	High-redundancy Tesla vehicles	In production
Dojo D1 tile	Tesla	N/A — training chip	~350 per tile	900 GB/s per tile	Neural net training (not inference)	Training cluster only
Snapdragon Ride Elite	Qualcomm	700+	Not fully disclosed	LPDDR5 with automotive-grade ECC	ADAS, L2+/L3 AV	In production (OEM rollout)
EyeQ Ultra	Mobileye	176	~10	Integrated LPDDR5	L4 autonomous driving inference	Available since 2025
TPU v5e	Google	N/A — training chip	~170 per chip	HBM2e, 1.6 TB/s per pod	Cloud model training (e.g., Waymo neural nets)	Google Cloud (not sold standalone)

Reading the table: TOPS figures vary significantly by how each manufacturer measures them — Tesla’s 1,000+ claim uses Tesla’s internal benchmark methodology, which may not be directly comparable to NVIDIA’s published INT8 numbers. Treat cross-vendor TOPS comparisons as directional, not precise. Power efficiency (TOPS per watt) is often a more meaningful metric for mobile and vehicle applications: HW4 achieves roughly 12–20 TOPS/W (est.), EyeQ Ultra roughly 17 TOPS/W, and Jetson Orin NX roughly 4–10 TOPS/W depending on operating point.

Tesla Dojo D1: Each D1 tile delivers 362 TFLOPS at BF16. Tesla’s ExaPOD configuration — 3,000 D1 tiles plus switching fabric — targets 100 exaFLOPS of aggregate training compute. This is a training system, not an inference chip. It is not deployed in vehicles.

Section 2 — Who uses what: company-level compute stack

The inference chip in a vehicle or robot is only half the picture. Training compute — the cloud or private cluster used to build the model in the first place — is equally consequential. The table below maps the major physical AI companies to both layers.

Company	Onboard Inference Chip	Training Compute	Notes
Waymo	Custom ASIC (Waymo Driver chip, 5th gen)	Google Cloud TPU v4/v5 clusters	In-vehicle chip details limited; Google Cloud relationship provides training scale
Tesla	HW4 (single or dual)	Dojo + NVIDIA H100 clusters (transitional)	Active vertical integration — moving training toward Dojo; HW4 designed in-house
Figure AI	NVIDIA Jetson Thor	NVIDIA DGX / H100 clusters	Foundation model trained offboard; Thor handles onboard inference
Agility Robotics (Digit)	Intel / NVIDIA edge compute (mixed)	AWS cloud compute	Amazon ownership provides AWS infrastructure; onboard chip details limited
1X Technologies	NVIDIA Jetson Thor platform	NVIDIA DGX-based	OpenAI partnership influences model training stack
Boston Dynamics (Atlas)	Custom actuator compute + NVIDIA Isaac platform	NVIDIA Isaac Sim / cloud training	Isaac platform used for simulation-to-real transfer
Apptronik (Apollo)	NVIDIA-based edge compute	AWS / NVIDIA (est.)	Google/Samsung investment; training stack not fully disclosed

What this table reveals: Tesla and Waymo have vertically integrated or deeply partnered on both inference and training. The humanoid startup cohort — Figure, 1X, Apptronik — converges on NVIDIA Jetson Thor for inference and NVIDIA DGX infrastructure for training. This creates a single-vendor dependency with supply implications discussed in Section 5.

Section 3 — Tesla’s vertical integration advantage

Tesla occupies a structurally different position from every other physical AI company in the compute layer. It designs both the inference chip deployed in its vehicles (HW4) and the training silicon used to build the models (Dojo D1). No other physical AI company controls both ends of this stack.

What vertical integration buys Tesla

No NVIDIA export restriction exposure on training. Dojo D1 tiles are designed and manufactured with US-based tooling and supply chains. When the US government restricts export of NVIDIA H100 and A100 chips to certain markets, Tesla’s training pipeline — built on Dojo — is unaffected. This is a strategic asymmetry that compounds over time as export controls evolve.

Cost per TOPS at vehicle scale. HW4 is manufactured and integrated as part of Tesla’s vehicle production line. The cost of the inference compute is amortized across the vehicle hardware margin. Buying Mobileye EyeQ Ultra or Qualcomm Snapdragon Ride Elite as a third-party component adds a vendor margin layer and creates a procurement dependency. Tesla eliminates both by designing and integrating in-house.

Training compute: Dojo vs. NVIDIA H100 cluster comparison. Tesla’s ExaPOD targets 100 exaFLOPS of BF16 training compute across 3,000 D1 tiles. A comparable NVIDIA H100 cluster at 100 exaFLOPS would require roughly 3,100 H100 GPUs (each delivering approximately 32 TFLOPS BF16 at SXM5 peak). At data center pricing (est.), an H100 cluster of that scale represents several hundred million dollars in hardware plus power and cooling infrastructure. Tesla’s Dojo is designed to deliver equivalent scale at lower total cost of ownership — though external validation of this cost claim is not publicly available.

The trade-off: Vertical integration carries engineering risk. Dojo development has taken longer than initial public timelines suggested, and Tesla has continued to use NVIDIA H100 clusters for training during the transition. The full shift to Dojo for primary training is a stated goal, not a completed transition as of mid-2026.

Section 4 — Waymo’s compute stack

Waymo’s approach is the inverse of Tesla’s in one important respect: it does not design its own chips, but it has deeply integrated access to the most powerful training infrastructure in the industry through its parent company, Alphabet/Google.

In-vehicle inference: the Waymo Driver chip

Waymo has developed a custom ASIC for onboard inference — the fifth-generation Waymo Driver chip. The detailed specifications of this chip are not publicly disclosed, which is consistent with Waymo’s practice of protecting technical differentiation. What is known from Waymo’s public communications:

The chip handles real-time sensor fusion across the Waymo One sensor suite: cameras, lidar, radar
It runs the perception, prediction, and planning stack onboard, enabling fully driverless operation
Each generation of the chip has improved power efficiency and processing throughput relative to the prior generation

Waymo does not sell or license its inference chip. It is purpose-built for the Waymo One vehicle and is not a general-purpose automotive chip.

Training: Google Cloud TPU scale advantage

Waymo trains its neural networks on Google Cloud TPU v4 and v5 infrastructure. Google Cloud’s TPU pod configurations reach exaFLOP-scale compute. This gives Waymo access to training compute that is comparable to or exceeds what any humanoid startup can provision through NVIDIA cloud instances, at a cost structure that reflects the Google parent relationship rather than market rates.

The structural implication: Waymo’s training scale advantage is not something a startup can replicate by raising another round. The access to Google TPU infrastructure at cost is a structural moat. The constraint for Waymo is not compute — it is data diversity (driving miles across more cities, conditions, and edge cases) and vehicle manufacturing scale.

Section 5 — The NVIDIA bottleneck for humanoid startups

The convergence of humanoid robot startups on a single inference platform — NVIDIA Jetson Thor — creates a supply concentration risk that is not widely discussed in coverage of the humanoid robot ramp.

Why Thor is the default choice

Jetson Thor offers the highest TOPS-per-watt ratio in its class for a commercially available humanoid-scale edge compute module. At 800 TOPS and approximately 60W, it enables onboard inference for large vision-language-action models without requiring an external compute tether. NVIDIA’s Isaac robotics platform — simulation, training pipeline, and deployment tooling — integrates natively with Jetson hardware. For a startup that wants to move fast without building its own silicon team, Thor plus Isaac is the rational choice.

The allocation problem

NVIDIA Jetson Thor is a high-complexity system-on-module that competes for NVIDIA’s internal engineering and manufacturing capacity alongside data center GPU demand. As of mid-2026, Jetson Thor is reported to be on allocation — meaning demand from humanoid robot manufacturers exceeds immediately available supply. This is consistent with the standard product lifecycle for a new Jetson module: initial production runs are limited, and allocation priority is managed by NVIDIA.

What this means for ramp timelines

For companies like Figure AI, 1X Technologies, and Apptronik — all of which depend on Jetson Thor for their onboard compute — the robot hardware ramp is partly gated by NVIDIA’s production allocation decisions. A company can design the best humanoid robot frame, train the best foundation model, and sign the best customer contracts, but if Thor modules are on a 6–12 month lead time, the physical production rate is constrained by silicon, not engineering.

Tesla and Waymo are insulated from this constraint. Tesla uses HW4, its own chip. Waymo uses its own custom ASIC. Neither depends on NVIDIA for in-vehicle inference. The constraint falls exclusively on the humanoid startups that chose the fast path of using commercially available NVIDIA hardware rather than investing in custom silicon — a trade-off that made sense at early stage but becomes a bottleneck at production scale.

The longer-term resolution: Humanoid robots reaching meaningful production volumes will face a make-or-buy decision on silicon. Companies that reach Series C and beyond will have the capital to explore custom ASIC development — a 3–5 year program — or to negotiate preferred allocation agreements with NVIDIA. Neither is a short-term solution. For the 2026–2028 period, the NVIDIA Thor allocation ceiling is a real constraint on how fast the humanoid robot industry can scale.

Benchmark context: this is the fifth article in the physical AI series

This tracker is the fifth in a series covering physical AI from multiple angles:

Operational ramp metrics — production counts, deployment scale, miles driven
Humanoid robot technology — hardware generations, dexterity benchmarks, foundation model capabilities
AV safety and regulation — California DMV data, NHTSA crash reporting, state permit maps
Investment and valuation — capital flows, funding rounds, implied valuations
Compute and silicon — this article

The compute layer sits beneath all four prior topics. The operational ramp (article 1) is partly a function of how many inference chips are available. The humanoid technology benchmarks (article 2) depend on what models can run onboard in real time. The investment picture (article 4) will ultimately be shaped by which companies control their own silicon stack and which remain dependent on third-party allocations. Silicon is not the most visible layer of physical AI — but it is the most foundational one.