Skip to content
AI-Daily-Builder

2026-06-18 views

Physical AI Compute Infrastructure — Tesla Dojo vs Google TPU vs NVIDIA H100 in the AV Training Arms Race

Tesla bets on Dojo custom silicon at $1/FLOP target while Waymo inherits Google TPU scale; both outpace NVIDIA-dependent rivals on training iteration speed.

Article 130 in the Physical AI Benchmark Series — Physical AI Compute Infrastructure: Tesla Dojo vs Google TPU vs NVIDIA H100/H200 in the AI Training Arms Race Behind FSD, Waymo’s Neural Nets, and Optimus Policy Learning

The AI models that power FSD, Waymo’s perception system, and Optimus policy learning are trained on massive compute clusters. Compute infrastructure determines how fast each company can iterate — how quickly it can train a new model, run ablation experiments, and deploy improvements to the fleet. Tesla bet on custom silicon (Dojo D1 chips and ExaPOD clusters). Waymo inherits Google’s world-class TPU infrastructure. Every other AV and robotics company rents NVIDIA H100/H200 clusters. This article maps compute infrastructure as a Physical AI benchmark dimension.

All figures labeled “(est.)” are derived from public market information, company disclosures, and analyst estimates rather than verified primary data.


Section 1 — Tesla Dojo: The Custom Silicon Bet

Tesla’s Dojo program is the most ambitious custom-silicon bet in the autonomous-vehicle industry. Rather than renting NVIDIA compute from cloud providers, Tesla designed its own training chip (the Dojo D1) and assembled it into ExaPOD clusters dedicated entirely to FSD training, auto-labeling, and Optimus policy learning.

MetricDojo D1 ChipDojo ExaPOD (Training Cluster)Status (mid-2026)
ArchitectureCustom TSMC 7nm; 362 TFLOPS BF16; 10 TB/s on-chip bandwidth (disclosed by Tesla)120 Dojo D1 chips per training tile; 3,000 chips per ExaPOD cabinetCustom design — no GPU vendor dependency
Target compute1 ExaFLOP per ExaPOD cluster (target, disclosed by Tesla)Multiple ExaPODs = multiple ExaFLOPs totalApproximately 1 ExaFLOP achieved (est.) per Tesla AI Day disclosures
Training purposeFSD neural net (video to driving policy); Occupancy Network; Auto-Labeling pipelineFull FSD training runs: takes raw video from 6M+ fleet and produces updated FSD modelFSD v12/v13/v14 trained on Dojo (est.)
Optimus useOptimus policy learning (manipulation, navigation) shares same Dojo infrastructure (est.)Humanoid policy requires more diverse data than driving — higher compute per improvement (est.)Early Optimus training on Dojo (est.); expanding
Cost vs NVIDIA alternativeMusk has cited $1/training FLOP target vs approximately $3-4/FLOP for rented NVIDIA H100 clusters (est.)If achieved: approximately 3-4x cost advantage per training run vs cloud GPUAdvantage depends on Dojo utilization rate and yield
RiskCustom silicon yield risk; TSMC 7nm now mature but Dojo architecture proprietary; if chip design has bugs, slow to fixSingle-source dependency on Tesla’s own chip teamKey risk: NVIDIA H100 clusters available now and at scale; Dojo buildout has had delays
Strategic valueIf Dojo delivers on cost target: Tesla trains FSD faster and cheaper than any competitor who rents computeTraining speed = model iteration speed = disengagement rate improvement rateThe Dojo bet converts capex into a durable cost moat

The Dojo thesis rests on a straightforward cost equation: if Tesla can train a model for $1/FLOP rather than $3-4/FLOP on rented H100 clusters, each iteration of FSD costs three to four times less than the equivalent run at a competitor. Over hundreds of training runs per year, that cost advantage compounds into an iteration-speed advantage — more experiments per dollar means faster convergence to a better model. The critical uncertainty is whether Dojo meets its utilization and yield targets; custom silicon programs historically run 12-24 months late and incur higher-than-expected per-chip costs in early production (est.).


Section 2 — Waymo and Google TPU: Infrastructure Advantage by Parentage

Waymo’s compute advantage is structural rather than earned: as an Alphabet subsidiary, Waymo inherits access to Google’s TPU infrastructure — the same custom silicon that powers Google Search, YouTube recommendations, and Gemini training runs. No AV startup can match this without comparable capital investment.

MetricGoogle TPU v4/v5Waymo’s AccessStrategic Implication
ArchitectureGoogle custom TPU (Tensor Processing Unit); v4 = 275 TFLOPS; v5p = approximately 460 TFLOPS per chip (est.)Waymo is an Alphabet subsidiary — full access to Google’s TPU fleet and Google Cloud infrastructureNo capital outlay for Waymo on compute; Alphabet absorbs infrastructure cost
Cluster scaleGoogle operates some of the largest TPU clusters in the world (exact capacity not disclosed)Waymo effectively has access to Google-scale compute on-demandWaymo’s compute ceiling is Google’s entire infrastructure
Simulation infrastructureGoogle’s NeRF-based scene reconstruction (simulation at scale) runs on TPU + GPU clustersWaymo’s simulation pipeline multiplies real miles into synthetic training data1 real mile produces 1,000+ simulation variations; TPU trains on all of them
Cost to WaymoInternal Alphabet cost allocation (not disclosed); Waymo pays internal transfer priceEffectively a subsidy from AlphabetCompetitive moat: no AV startup can match Google’s compute without similar capex
RiskDependency on Alphabet: if Waymo is spun out or divested, TPU access may changeAlphabet has shown willingness to continue Waymo investmentLow risk while Waymo remains subsidiary; medium risk if standalone IPO
Training focusWaymo’s MultiPath++ (trajectory prediction), OccupancyFlow (environment model), perception stackAll major Waymo neural nets trained on Google TPU infrastructureGoogle Brain / DeepMind collaboration possible (est.)

The simulation multiplier is Waymo’s most underappreciated compute amplifier. A single mile of real-world driving data can be converted into thousands of simulation variants — different lighting conditions, different pedestrian behaviors, different traffic patterns, sensor noise variations. Every simulation variant is a new training example. When those examples are generated and processed by Google-scale TPU infrastructure, Waymo’s effective training data volume far exceeds its physical fleet miles. This is the structural reason why Waymo’s perception models have historically led on long-tail edge cases: the training distribution is wider than the raw mile count suggests.


Section 3 — NVIDIA: The Incumbent Every Other AV Company Depends On

For any AV or robotics company that is not Tesla and not Waymo, NVIDIA is the only viable path to scale compute. This creates a structural dependency that limits training iteration speed to whatever H100/H200 capacity the company can afford or negotiate access to.

MetricNVIDIA H100NVIDIA H200NVIDIA DRIVE Orin (In-Vehicle)
ArchitectureHopper; 3.9 PetaFLOPS BF16 (disclosed)Hopper + HBM3e; approximately 3.9 PF BF16 + higher memory bandwidth254 TOPS per chip; automotive safety-grade
Cloud availabilityAWS, Azure, GCP, CoreWeave, Lambda Labs — accessible to any AV companyH200 available through same cloud providers as H100Sold to Tier 1 suppliers and AV companies
CostApproximately $2-4/hour on cloud (est., varies by provider and spot pricing)Approximately $3-5/hour cloud (est.)Approximately $750+ per chip (est.); in Zoox, Aurora, and many AV vehicles
AV companies using H100/H200 for trainingAurora, Zoox, Mobileye, Wayve, most non-Waymo/Tesla AV companies rent H100/H200 time
NVIDIA DRIVE platformSeparate product: DRIVE Orin (254 TOPS) + DRIVE Thor (2,000 TOPS, announced); in-vehicle AI for AV perception/planning
Strategic roleDefault infrastructure for AV training if you don’t have Dojo or Google TPUH200 = current frontier; H100 = accessible and widely availableNVIDIA’s in-vehicle compute dominates the non-Tesla/non-Waymo AV market
Risk for AV companiesConcentration risk: NVIDIA pricing power; H100 supply constraints in 2023 caused training delaysAll competitors dependent on NVIDIA for in-vehicle compute except Tesla (HW4) and Waymo (custom TPU)

The NVIDIA dependency creates a strategic asymmetry that compounds over time. Aurora, Zoox, Mobileye, and Wayve are all running their training workloads on rented H100 clusters. When NVIDIA announces H200 or the next Blackwell-generation chip, every one of these companies benefits equally — no one gets an advantage from hardware access. The training compute floor rises for everyone, but the ceiling remains the same: whatever the cloud market will sell you at market price. Tesla and Waymo, by contrast, control their own compute ceilings and can expand capacity through internal investment rather than cloud procurement.

The in-vehicle story is equally concentrated. NVIDIA DRIVE Orin (254 TOPS) is the dominant AV compute platform for every major AV program except Tesla (which uses its own HW4 chip at 288 TOPS) and Waymo (which uses custom in-vehicle TPU). NVIDIA DRIVE Thor at 2,000 TOPS is the announced successor, but until it ships at volume, Orin is the standard. This means NVIDIA controls not just the training-side compute dependency but also the inference-side compute dependency — creating a double leverage point over the entire non-Tesla/non-Waymo AV industry.


Section 4 — Compute as a Ramp Multiplier: Training Iteration Speed

The practical consequence of compute infrastructure differences is training iteration speed — how many model improvement cycles each company can run per month. More iterations mean faster disengagement rate improvement, faster edge-case coverage, faster adaptation to new driving environments.

CompanyTraining ClusterEst. Training Runs/MonthModel Iteration SpeedRamp Implication
TeslaDojo ExaPOD (1+ ExaFLOP est.)High — proprietary cluster dedicated to FSD + OptimusFastest iteration if Dojo performs to spec (est.)Disengagement rate improvement rate is proportional to training iteration speed
WaymoGoogle TPU (Alphabet-scale)Very high — Google infrastructure; no contention with commercial customersVery fast; Google’s compute scale unmatchedWaymo’s simulation-to-training pipeline multiplies effective compute
AuroraRented NVIDIA H100/H200Moderate — budget-constrained; prioritizes safety validationModerate — dependent on capitalFundraising constraint limits training iteration
ZooxAmazon cloud (EC2 + rented H100) — Amazon owns ZooxHigh — Amazon infrastructureFast — Amazon cloud access similar to Waymo’s Google advantageUnderappreciated: Zoox’s Amazon ownership = cloud compute on demand
MobileyeIntel compute + rented H100ModerateModerateEyeQ chip team has silicon expertise; less training-compute focused
Figure AIRented H100; NVIDIA partnershipModerateModerateOpenAI language model integration = unique compute access for language component

Zoox deserves particular attention in this table because it is the most underappreciated compute-advantaged AV company after Tesla and Waymo. Amazon’s ownership of Zoox gives it access to AWS infrastructure at internal transfer prices — a structural advantage essentially parallel to Waymo’s TPU access. This advantage has not translated into visible commercial traction yet, but it means Zoox’s training iteration ceiling is not limited by the same budget constraints that cap Aurora’s experiments.


Section 5 — Compute Infrastructure Benchmark Scorecard

DimensionTesla (Dojo)Waymo (Google TPU)NVIDIA-Dependent CompaniesEdge
Raw compute availableApproximately 1+ ExaFLOP est. (growing)Google-scale (not disclosed; effectively unlimited)Limited by budget and H100 availabilityWaymo (Alphabet backstop)
Cost per FLOP (est.)$1/FLOP target (if Dojo delivers)Near-zero (internal transfer)$2-4/FLOP cloud (est.)Waymo or Tesla (depending on Dojo yield)
Custom silicon advantageYes — Dojo D1; D2 in developmentYes — Google TPU v4/v5No — dependent on NVIDIATesla + Waymo both have custom silicon moat
Iteration speedHigh (dedicated cluster)Very high (Google scale + simulation multiplier)Moderate (budget-constrained)Waymo slight edge (simulation multiplier compounds)
In-vehicle computeHW4 = 288 TOPS (4x custom TSMC 7nm); no NVIDIA dependencyWaymo custom TPU in vehicleNVIDIA DRIVE Orin (approximately $750+ est.)Tesla (vertical integration; no 3rd party cost or lead time)
Dojo vs TPU verdictDojo is a multi-billion dollar bet that custom silicon produces a durable cost moatGoogle TPU is already proven at scale; Waymo inherits itBoth superior to rented H100 at scale; Waymo’s access is larger today

The compute infrastructure scorecard reveals a two-tier Physical AI industry. Tier 1 consists of Tesla and Waymo — both with custom silicon for training and inference, both with dedicated clusters that can expand independently of cloud market pricing, and both with in-vehicle compute that does not depend on NVIDIA. Tier 2 consists of every other AV and robotics company, all of which are structurally dependent on NVIDIA for both training and in-vehicle inference, with training iteration rates limited by cloud budgets rather than infrastructure ceiling.

The long-run implication is a training iteration gap that compounds over time. If Tesla can run 3x as many FSD training experiments per month as Aurora because Dojo costs one-third per FLOP compared to rented H100, and if Waymo can run 5x as many experiments because Google infrastructure has no capacity ceiling, then the neural network quality gap between Tier 1 and Tier 2 widens with every passing month — independent of the quality of the research teams involved. Compute infrastructure is not a sufficient condition for Physical AI leadership, but it is increasingly a necessary one.

Note: All figures labeled “(est.)” are derived from public market information, company disclosures, analyst estimates, and industry reports as of mid-2026. This article does not constitute investment advice.


Sources

Tags

Tip