2026-06-18 — views
Physical AI Compute Infrastructure — Tesla Dojo vs Google TPU vs NVIDIA H100 in the AV Training Arms Race
Tesla bets on Dojo custom silicon at $1/FLOP target while Waymo inherits Google TPU scale; both outpace NVIDIA-dependent rivals on training iteration speed.
Article 130 in the Physical AI Benchmark Series — Physical AI Compute Infrastructure: Tesla Dojo vs Google TPU vs NVIDIA H100/H200 in the AI Training Arms Race Behind FSD, Waymo’s Neural Nets, and Optimus Policy Learning
The AI models that power FSD, Waymo’s perception system, and Optimus policy learning are trained on massive compute clusters. Compute infrastructure determines how fast each company can iterate — how quickly it can train a new model, run ablation experiments, and deploy improvements to the fleet. Tesla bet on custom silicon (Dojo D1 chips and ExaPOD clusters). Waymo inherits Google’s world-class TPU infrastructure. Every other AV and robotics company rents NVIDIA H100/H200 clusters. This article maps compute infrastructure as a Physical AI benchmark dimension.
All figures labeled “(est.)” are derived from public market information, company disclosures, and analyst estimates rather than verified primary data.
Section 1 — Tesla Dojo: The Custom Silicon Bet
Tesla’s Dojo program is the most ambitious custom-silicon bet in the autonomous-vehicle industry. Rather than renting NVIDIA compute from cloud providers, Tesla designed its own training chip (the Dojo D1) and assembled it into ExaPOD clusters dedicated entirely to FSD training, auto-labeling, and Optimus policy learning.
| Metric | Dojo D1 Chip | Dojo ExaPOD (Training Cluster) | Status (mid-2026) |
|---|---|---|---|
| Architecture | Custom TSMC 7nm; 362 TFLOPS BF16; 10 TB/s on-chip bandwidth (disclosed by Tesla) | 120 Dojo D1 chips per training tile; 3,000 chips per ExaPOD cabinet | Custom design — no GPU vendor dependency |
| Target compute | 1 ExaFLOP per ExaPOD cluster (target, disclosed by Tesla) | Multiple ExaPODs = multiple ExaFLOPs total | Approximately 1 ExaFLOP achieved (est.) per Tesla AI Day disclosures |
| Training purpose | FSD neural net (video to driving policy); Occupancy Network; Auto-Labeling pipeline | Full FSD training runs: takes raw video from 6M+ fleet and produces updated FSD model | FSD v12/v13/v14 trained on Dojo (est.) |
| Optimus use | Optimus policy learning (manipulation, navigation) shares same Dojo infrastructure (est.) | Humanoid policy requires more diverse data than driving — higher compute per improvement (est.) | Early Optimus training on Dojo (est.); expanding |
| Cost vs NVIDIA alternative | Musk has cited $1/training FLOP target vs approximately $3-4/FLOP for rented NVIDIA H100 clusters (est.) | If achieved: approximately 3-4x cost advantage per training run vs cloud GPU | Advantage depends on Dojo utilization rate and yield |
| Risk | Custom silicon yield risk; TSMC 7nm now mature but Dojo architecture proprietary; if chip design has bugs, slow to fix | Single-source dependency on Tesla’s own chip team | Key risk: NVIDIA H100 clusters available now and at scale; Dojo buildout has had delays |
| Strategic value | If Dojo delivers on cost target: Tesla trains FSD faster and cheaper than any competitor who rents compute | Training speed = model iteration speed = disengagement rate improvement rate | The Dojo bet converts capex into a durable cost moat |
The Dojo thesis rests on a straightforward cost equation: if Tesla can train a model for $1/FLOP rather than $3-4/FLOP on rented H100 clusters, each iteration of FSD costs three to four times less than the equivalent run at a competitor. Over hundreds of training runs per year, that cost advantage compounds into an iteration-speed advantage — more experiments per dollar means faster convergence to a better model. The critical uncertainty is whether Dojo meets its utilization and yield targets; custom silicon programs historically run 12-24 months late and incur higher-than-expected per-chip costs in early production (est.).
Section 2 — Waymo and Google TPU: Infrastructure Advantage by Parentage
Waymo’s compute advantage is structural rather than earned: as an Alphabet subsidiary, Waymo inherits access to Google’s TPU infrastructure — the same custom silicon that powers Google Search, YouTube recommendations, and Gemini training runs. No AV startup can match this without comparable capital investment.
| Metric | Google TPU v4/v5 | Waymo’s Access | Strategic Implication |
|---|---|---|---|
| Architecture | Google custom TPU (Tensor Processing Unit); v4 = 275 TFLOPS; v5p = approximately 460 TFLOPS per chip (est.) | Waymo is an Alphabet subsidiary — full access to Google’s TPU fleet and Google Cloud infrastructure | No capital outlay for Waymo on compute; Alphabet absorbs infrastructure cost |
| Cluster scale | Google operates some of the largest TPU clusters in the world (exact capacity not disclosed) | Waymo effectively has access to Google-scale compute on-demand | Waymo’s compute ceiling is Google’s entire infrastructure |
| Simulation infrastructure | Google’s NeRF-based scene reconstruction (simulation at scale) runs on TPU + GPU clusters | Waymo’s simulation pipeline multiplies real miles into synthetic training data | 1 real mile produces 1,000+ simulation variations; TPU trains on all of them |
| Cost to Waymo | Internal Alphabet cost allocation (not disclosed); Waymo pays internal transfer price | Effectively a subsidy from Alphabet | Competitive moat: no AV startup can match Google’s compute without similar capex |
| Risk | Dependency on Alphabet: if Waymo is spun out or divested, TPU access may change | Alphabet has shown willingness to continue Waymo investment | Low risk while Waymo remains subsidiary; medium risk if standalone IPO |
| Training focus | Waymo’s MultiPath++ (trajectory prediction), OccupancyFlow (environment model), perception stack | All major Waymo neural nets trained on Google TPU infrastructure | Google Brain / DeepMind collaboration possible (est.) |
The simulation multiplier is Waymo’s most underappreciated compute amplifier. A single mile of real-world driving data can be converted into thousands of simulation variants — different lighting conditions, different pedestrian behaviors, different traffic patterns, sensor noise variations. Every simulation variant is a new training example. When those examples are generated and processed by Google-scale TPU infrastructure, Waymo’s effective training data volume far exceeds its physical fleet miles. This is the structural reason why Waymo’s perception models have historically led on long-tail edge cases: the training distribution is wider than the raw mile count suggests.
Section 3 — NVIDIA: The Incumbent Every Other AV Company Depends On
For any AV or robotics company that is not Tesla and not Waymo, NVIDIA is the only viable path to scale compute. This creates a structural dependency that limits training iteration speed to whatever H100/H200 capacity the company can afford or negotiate access to.
| Metric | NVIDIA H100 | NVIDIA H200 | NVIDIA DRIVE Orin (In-Vehicle) |
|---|---|---|---|
| Architecture | Hopper; 3.9 PetaFLOPS BF16 (disclosed) | Hopper + HBM3e; approximately 3.9 PF BF16 + higher memory bandwidth | 254 TOPS per chip; automotive safety-grade |
| Cloud availability | AWS, Azure, GCP, CoreWeave, Lambda Labs — accessible to any AV company | H200 available through same cloud providers as H100 | Sold to Tier 1 suppliers and AV companies |
| Cost | Approximately $2-4/hour on cloud (est., varies by provider and spot pricing) | Approximately $3-5/hour cloud (est.) | Approximately $750+ per chip (est.); in Zoox, Aurora, and many AV vehicles |
| AV companies using H100/H200 for training | Aurora, Zoox, Mobileye, Wayve, most non-Waymo/Tesla AV companies rent H100/H200 time | — | — |
| NVIDIA DRIVE platform | — | — | Separate product: DRIVE Orin (254 TOPS) + DRIVE Thor (2,000 TOPS, announced); in-vehicle AI for AV perception/planning |
| Strategic role | Default infrastructure for AV training if you don’t have Dojo or Google TPU | H200 = current frontier; H100 = accessible and widely available | NVIDIA’s in-vehicle compute dominates the non-Tesla/non-Waymo AV market |
| Risk for AV companies | Concentration risk: NVIDIA pricing power; H100 supply constraints in 2023 caused training delays | — | All competitors dependent on NVIDIA for in-vehicle compute except Tesla (HW4) and Waymo (custom TPU) |
The NVIDIA dependency creates a strategic asymmetry that compounds over time. Aurora, Zoox, Mobileye, and Wayve are all running their training workloads on rented H100 clusters. When NVIDIA announces H200 or the next Blackwell-generation chip, every one of these companies benefits equally — no one gets an advantage from hardware access. The training compute floor rises for everyone, but the ceiling remains the same: whatever the cloud market will sell you at market price. Tesla and Waymo, by contrast, control their own compute ceilings and can expand capacity through internal investment rather than cloud procurement.
The in-vehicle story is equally concentrated. NVIDIA DRIVE Orin (254 TOPS) is the dominant AV compute platform for every major AV program except Tesla (which uses its own HW4 chip at 288 TOPS) and Waymo (which uses custom in-vehicle TPU). NVIDIA DRIVE Thor at 2,000 TOPS is the announced successor, but until it ships at volume, Orin is the standard. This means NVIDIA controls not just the training-side compute dependency but also the inference-side compute dependency — creating a double leverage point over the entire non-Tesla/non-Waymo AV industry.
Section 4 — Compute as a Ramp Multiplier: Training Iteration Speed
The practical consequence of compute infrastructure differences is training iteration speed — how many model improvement cycles each company can run per month. More iterations mean faster disengagement rate improvement, faster edge-case coverage, faster adaptation to new driving environments.
| Company | Training Cluster | Est. Training Runs/Month | Model Iteration Speed | Ramp Implication |
|---|---|---|---|---|
| Tesla | Dojo ExaPOD (1+ ExaFLOP est.) | High — proprietary cluster dedicated to FSD + Optimus | Fastest iteration if Dojo performs to spec (est.) | Disengagement rate improvement rate is proportional to training iteration speed |
| Waymo | Google TPU (Alphabet-scale) | Very high — Google infrastructure; no contention with commercial customers | Very fast; Google’s compute scale unmatched | Waymo’s simulation-to-training pipeline multiplies effective compute |
| Aurora | Rented NVIDIA H100/H200 | Moderate — budget-constrained; prioritizes safety validation | Moderate — dependent on capital | Fundraising constraint limits training iteration |
| Zoox | Amazon cloud (EC2 + rented H100) — Amazon owns Zoox | High — Amazon infrastructure | Fast — Amazon cloud access similar to Waymo’s Google advantage | Underappreciated: Zoox’s Amazon ownership = cloud compute on demand |
| Mobileye | Intel compute + rented H100 | Moderate | Moderate | EyeQ chip team has silicon expertise; less training-compute focused |
| Figure AI | Rented H100; NVIDIA partnership | Moderate | Moderate | OpenAI language model integration = unique compute access for language component |
Zoox deserves particular attention in this table because it is the most underappreciated compute-advantaged AV company after Tesla and Waymo. Amazon’s ownership of Zoox gives it access to AWS infrastructure at internal transfer prices — a structural advantage essentially parallel to Waymo’s TPU access. This advantage has not translated into visible commercial traction yet, but it means Zoox’s training iteration ceiling is not limited by the same budget constraints that cap Aurora’s experiments.
Section 5 — Compute Infrastructure Benchmark Scorecard
| Dimension | Tesla (Dojo) | Waymo (Google TPU) | NVIDIA-Dependent Companies | Edge |
|---|---|---|---|---|
| Raw compute available | Approximately 1+ ExaFLOP est. (growing) | Google-scale (not disclosed; effectively unlimited) | Limited by budget and H100 availability | Waymo (Alphabet backstop) |
| Cost per FLOP (est.) | $1/FLOP target (if Dojo delivers) | Near-zero (internal transfer) | $2-4/FLOP cloud (est.) | Waymo or Tesla (depending on Dojo yield) |
| Custom silicon advantage | Yes — Dojo D1; D2 in development | Yes — Google TPU v4/v5 | No — dependent on NVIDIA | Tesla + Waymo both have custom silicon moat |
| Iteration speed | High (dedicated cluster) | Very high (Google scale + simulation multiplier) | Moderate (budget-constrained) | Waymo slight edge (simulation multiplier compounds) |
| In-vehicle compute | HW4 = 288 TOPS (4x custom TSMC 7nm); no NVIDIA dependency | Waymo custom TPU in vehicle | NVIDIA DRIVE Orin (approximately $750+ est.) | Tesla (vertical integration; no 3rd party cost or lead time) |
| Dojo vs TPU verdict | Dojo is a multi-billion dollar bet that custom silicon produces a durable cost moat | Google TPU is already proven at scale; Waymo inherits it | — | Both superior to rented H100 at scale; Waymo’s access is larger today |
The compute infrastructure scorecard reveals a two-tier Physical AI industry. Tier 1 consists of Tesla and Waymo — both with custom silicon for training and inference, both with dedicated clusters that can expand independently of cloud market pricing, and both with in-vehicle compute that does not depend on NVIDIA. Tier 2 consists of every other AV and robotics company, all of which are structurally dependent on NVIDIA for both training and in-vehicle inference, with training iteration rates limited by cloud budgets rather than infrastructure ceiling.
The long-run implication is a training iteration gap that compounds over time. If Tesla can run 3x as many FSD training experiments per month as Aurora because Dojo costs one-third per FLOP compared to rented H100, and if Waymo can run 5x as many experiments because Google infrastructure has no capacity ceiling, then the neural network quality gap between Tier 1 and Tier 2 widens with every passing month — independent of the quality of the research teams involved. Compute infrastructure is not a sufficient condition for Physical AI leadership, but it is increasingly a necessary one.
Note: All figures labeled “(est.)” are derived from public market information, company disclosures, analyst estimates, and industry reports as of mid-2026. This article does not constitute investment advice.
Sources
- Tesla Dojo supercomputer — Tesla AI Day ↗
- Google TPU infrastructure — Google Cloud ↗
- NVIDIA H100 specifications — NVIDIA ↗
- NVIDIA DRIVE Orin automotive — NVIDIA ↗
- Waymo research and simulation infrastructure — Waymo ↗