2026-06-18 — views

Physical AI Compute Infrastructure — Tesla Dojo vs Google TPU vs NVIDIA H100 in the AV Training Arms Race

Tesla bets on Dojo custom silicon at $1/FLOP target while Waymo inherits Google TPU scale; both outpace NVIDIA-dependent rivals on training iteration speed.

Article 130 in the Physical AI Benchmark Series — Physical AI Compute Infrastructure: Tesla Dojo vs Google TPU vs NVIDIA H100/H200 in the AI Training Arms Race Behind FSD, Waymo’s Neural Nets, and Optimus Policy Learning

The AI models that power FSD, Waymo’s perception system, and Optimus policy learning are trained on massive compute clusters. Compute infrastructure determines how fast each company can iterate — how quickly it can train a new model, run ablation experiments, and deploy improvements to the fleet. Tesla bet on custom silicon (Dojo D1 chips and ExaPOD clusters). Waymo inherits Google’s world-class TPU infrastructure. Every other AV and robotics company rents NVIDIA H100/H200 clusters. This article maps compute infrastructure as a Physical AI benchmark dimension.

All figures labeled “(est.)” are derived from public market information, company disclosures, and analyst estimates rather than verified primary data.

Section 1 — Tesla Dojo: The Custom Silicon Bet

Tesla’s Dojo program is the most ambitious custom-silicon bet in the autonomous-vehicle industry. Rather than renting NVIDIA compute from cloud providers, Tesla designed its own training chip (the Dojo D1) and assembled it into ExaPOD clusters dedicated entirely to FSD training, auto-labeling, and Optimus policy learning.

Metric	Dojo D1 Chip	Dojo ExaPOD (Training Cluster)	Status (mid-2026)
Architecture	Custom TSMC 7nm; 362 TFLOPS BF16; 10 TB/s on-chip bandwidth (disclosed by Tesla)	120 Dojo D1 chips per training tile; 3,000 chips per ExaPOD cabinet	Custom design — no GPU vendor dependency
Target compute	1 ExaFLOP per ExaPOD cluster (target, disclosed by Tesla)	Multiple ExaPODs = multiple ExaFLOPs total	Approximately 1 ExaFLOP achieved (est.) per Tesla AI Day disclosures
Training purpose	FSD neural net (video to driving policy); Occupancy Network; Auto-Labeling pipeline	Full FSD training runs: takes raw video from 6M+ fleet and produces updated FSD model	FSD v12/v13/v14 trained on Dojo (est.)
Optimus use	Optimus policy learning (manipulation, navigation) shares same Dojo infrastructure (est.)	Humanoid policy requires more diverse data than driving — higher compute per improvement (est.)	Early Optimus training on Dojo (est.); expanding
Cost vs NVIDIA alternative	Musk has cited $1/training FLOP target vs approximately $3-4/FLOP for rented NVIDIA H100 clusters (est.)	If achieved: approximately 3-4x cost advantage per training run vs cloud GPU	Advantage depends on Dojo utilization rate and yield
Risk	Custom silicon yield risk; TSMC 7nm now mature but Dojo architecture proprietary; if chip design has bugs, slow to fix	Single-source dependency on Tesla’s own chip team	Key risk: NVIDIA H100 clusters available now and at scale; Dojo buildout has had delays
Strategic value	If Dojo delivers on cost target: Tesla trains FSD faster and cheaper than any competitor who rents compute	Training speed = model iteration speed = disengagement rate improvement rate	The Dojo bet converts capex into a durable cost moat

The Dojo thesis rests on a straightforward cost equation: if Tesla can train a model for $1/FLOP rather than $3-4/FLOP on rented H100 clusters, each iteration of FSD costs three to four times less than the equivalent run at a competitor. Over hundreds of training runs per year, that cost advantage compounds into an iteration-speed advantage — more experiments per dollar means faster convergence to a better model. The critical uncertainty is whether Dojo meets its utilization and yield targets; custom silicon programs historically run 12-24 months late and incur higher-than-expected per-chip costs in early production (est.).

Section 2 — Waymo and Google TPU: Infrastructure Advantage by Parentage

Waymo’s compute advantage is structural rather than earned: as an Alphabet subsidiary, Waymo inherits access to Google’s TPU infrastructure — the same custom silicon that powers Google Search, YouTube recommendations, and Gemini training runs. No AV startup can match this without comparable capital investment.

Metric	Google TPU v4/v5	Waymo’s Access	Strategic Implication
Architecture	Google custom TPU (Tensor Processing Unit); v4 = 275 TFLOPS; v5p = approximately 460 TFLOPS per chip (est.)	Waymo is an Alphabet subsidiary — full access to Google’s TPU fleet and Google Cloud infrastructure	No capital outlay for Waymo on compute; Alphabet absorbs infrastructure cost
Cluster scale	Google operates some of the largest TPU clusters in the world (exact capacity not disclosed)	Waymo effectively has access to Google-scale compute on-demand	Waymo’s compute ceiling is Google’s entire infrastructure
Simulation infrastructure	Google’s NeRF-based scene reconstruction (simulation at scale) runs on TPU + GPU clusters	Waymo’s simulation pipeline multiplies real miles into synthetic training data	1 real mile produces 1,000+ simulation variations; TPU trains on all of them
Cost to Waymo	Internal Alphabet cost allocation (not disclosed); Waymo pays internal transfer price	Effectively a subsidy from Alphabet	Competitive moat: no AV startup can match Google’s compute without similar capex
Risk	Dependency on Alphabet: if Waymo is spun out or divested, TPU access may change	Alphabet has shown willingness to continue Waymo investment	Low risk while Waymo remains subsidiary; medium risk if standalone IPO
Training focus	Waymo’s MultiPath++ (trajectory prediction), OccupancyFlow (environment model), perception stack	All major Waymo neural nets trained on Google TPU infrastructure	Google Brain / DeepMind collaboration possible (est.)

The simulation multiplier is Waymo’s most underappreciated compute amplifier. A single mile of real-world driving data can be converted into thousands of simulation variants — different lighting conditions, different pedestrian behaviors, different traffic patterns, sensor noise variations. Every simulation variant is a new training example. When those examples are generated and processed by Google-scale TPU infrastructure, Waymo’s effective training data volume far exceeds its physical fleet miles. This is the structural reason why Waymo’s perception models have historically led on long-tail edge cases: the training distribution is wider than the raw mile count suggests.

Section 3 — NVIDIA: The Incumbent Every Other AV Company Depends On

For any AV or robotics company that is not Tesla and not Waymo, NVIDIA is the only viable path to scale compute. This creates a structural dependency that limits training iteration speed to whatever H100/H200 capacity the company can afford or negotiate access to.

Metric	NVIDIA H100	NVIDIA H200	NVIDIA DRIVE Orin (In-Vehicle)
Architecture	Hopper; 3.9 PetaFLOPS BF16 (disclosed)	Hopper + HBM3e; approximately 3.9 PF BF16 + higher memory bandwidth	254 TOPS per chip; automotive safety-grade
Cloud availability	AWS, Azure, GCP, CoreWeave, Lambda Labs — accessible to any AV company	H200 available through same cloud providers as H100	Sold to Tier 1 suppliers and AV companies
Cost	Approximately $2-4/hour on cloud (est., varies by provider and spot pricing)	Approximately $3-5/hour cloud (est.)	Approximately $750+ per chip (est.); in Zoox, Aurora, and many AV vehicles
AV companies using H100/H200 for training	Aurora, Zoox, Mobileye, Wayve, most non-Waymo/Tesla AV companies rent H100/H200 time	—	—
NVIDIA DRIVE platform	—	—	Separate product: DRIVE Orin (254 TOPS) + DRIVE Thor (2,000 TOPS, announced); in-vehicle AI for AV perception/planning
Strategic role	Default infrastructure for AV training if you don’t have Dojo or Google TPU	H200 = current frontier; H100 = accessible and widely available	NVIDIA’s in-vehicle compute dominates the non-Tesla/non-Waymo AV market
Risk for AV companies	Concentration risk: NVIDIA pricing power; H100 supply constraints in 2023 caused training delays	—	All competitors dependent on NVIDIA for in-vehicle compute except Tesla (HW4) and Waymo (custom TPU)

The NVIDIA dependency creates a strategic asymmetry that compounds over time. Aurora, Zoox, Mobileye, and Wayve are all running their training workloads on rented H100 clusters. When NVIDIA announces H200 or the next Blackwell-generation chip, every one of these companies benefits equally — no one gets an advantage from hardware access. The training compute floor rises for everyone, but the ceiling remains the same: whatever the cloud market will sell you at market price. Tesla and Waymo, by contrast, control their own compute ceilings and can expand capacity through internal investment rather than cloud procurement.

The in-vehicle story is equally concentrated. NVIDIA DRIVE Orin (254 TOPS) is the dominant AV compute platform for every major AV program except Tesla (which uses its own HW4 chip at 288 TOPS) and Waymo (which uses custom in-vehicle TPU). NVIDIA DRIVE Thor at 2,000 TOPS is the announced successor, but until it ships at volume, Orin is the standard. This means NVIDIA controls not just the training-side compute dependency but also the inference-side compute dependency — creating a double leverage point over the entire non-Tesla/non-Waymo AV industry.

Section 4 — Compute as a Ramp Multiplier: Training Iteration Speed

The practical consequence of compute infrastructure differences is training iteration speed — how many model improvement cycles each company can run per month. More iterations mean faster disengagement rate improvement, faster edge-case coverage, faster adaptation to new driving environments.

Company	Training Cluster	Est. Training Runs/Month	Model Iteration Speed	Ramp Implication
Tesla	Dojo ExaPOD (1+ ExaFLOP est.)	High — proprietary cluster dedicated to FSD + Optimus	Fastest iteration if Dojo performs to spec (est.)	Disengagement rate improvement rate is proportional to training iteration speed
Waymo	Google TPU (Alphabet-scale)	Very high — Google infrastructure; no contention with commercial customers	Very fast; Google’s compute scale unmatched	Waymo’s simulation-to-training pipeline multiplies effective compute
Aurora	Rented NVIDIA H100/H200	Moderate — budget-constrained; prioritizes safety validation	Moderate — dependent on capital	Fundraising constraint limits training iteration
Zoox	Amazon cloud (EC2 + rented H100) — Amazon owns Zoox	High — Amazon infrastructure	Fast — Amazon cloud access similar to Waymo’s Google advantage	Underappreciated: Zoox’s Amazon ownership = cloud compute on demand
Mobileye	Intel compute + rented H100	Moderate	Moderate	EyeQ chip team has silicon expertise; less training-compute focused
Figure AI	Rented H100; NVIDIA partnership	Moderate	Moderate	OpenAI language model integration = unique compute access for language component

Zoox deserves particular attention in this table because it is the most underappreciated compute-advantaged AV company after Tesla and Waymo. Amazon’s ownership of Zoox gives it access to AWS infrastructure at internal transfer prices — a structural advantage essentially parallel to Waymo’s TPU access. This advantage has not translated into visible commercial traction yet, but it means Zoox’s training iteration ceiling is not limited by the same budget constraints that cap Aurora’s experiments.

Section 5 — Compute Infrastructure Benchmark Scorecard

Dimension	Tesla (Dojo)	Waymo (Google TPU)	NVIDIA-Dependent Companies	Edge
Raw compute available	Approximately 1+ ExaFLOP est. (growing)	Google-scale (not disclosed; effectively unlimited)	Limited by budget and H100 availability	Waymo (Alphabet backstop)
Cost per FLOP (est.)	$1/FLOP target (if Dojo delivers)	Near-zero (internal transfer)	$2-4/FLOP cloud (est.)	Waymo or Tesla (depending on Dojo yield)
Custom silicon advantage	Yes — Dojo D1; D2 in development	Yes — Google TPU v4/v5	No — dependent on NVIDIA	Tesla + Waymo both have custom silicon moat
Iteration speed	High (dedicated cluster)	Very high (Google scale + simulation multiplier)	Moderate (budget-constrained)	Waymo slight edge (simulation multiplier compounds)
In-vehicle compute	HW4 = 288 TOPS (4x custom TSMC 7nm); no NVIDIA dependency	Waymo custom TPU in vehicle	NVIDIA DRIVE Orin (approximately $750+ est.)	Tesla (vertical integration; no 3rd party cost or lead time)
Dojo vs TPU verdict	Dojo is a multi-billion dollar bet that custom silicon produces a durable cost moat	Google TPU is already proven at scale; Waymo inherits it	—	Both superior to rented H100 at scale; Waymo’s access is larger today

The compute infrastructure scorecard reveals a two-tier Physical AI industry. Tier 1 consists of Tesla and Waymo — both with custom silicon for training and inference, both with dedicated clusters that can expand independently of cloud market pricing, and both with in-vehicle compute that does not depend on NVIDIA. Tier 2 consists of every other AV and robotics company, all of which are structurally dependent on NVIDIA for both training and in-vehicle inference, with training iteration rates limited by cloud budgets rather than infrastructure ceiling.

The long-run implication is a training iteration gap that compounds over time. If Tesla can run 3x as many FSD training experiments per month as Aurora because Dojo costs one-third per FLOP compared to rented H100, and if Waymo can run 5x as many experiments because Google infrastructure has no capacity ceiling, then the neural network quality gap between Tier 1 and Tier 2 widens with every passing month — independent of the quality of the research teams involved. Compute infrastructure is not a sufficient condition for Physical AI leadership, but it is increasingly a necessary one.

Note: All figures labeled “(est.)” are derived from public market information, company disclosures, analyst estimates, and industry reports as of mid-2026. This article does not constitute investment advice.