2026-06-18 — views

Physical AI Compute Race 2026 — NVIDIA B200 vs Tesla Dojo vs Google TPU: The AV and Robotics Training Infrastructure Benchmark

NVIDIA B200 est. 9 exaFLOPS powers virtually all AV AI training. Tesla Dojo bets on custom silicon. Waymo uses Google TPU. Compute decides the race.

Article 205 in the Physical AI Benchmark Series — Physical AI Compute Race 2026: NVIDIA H100/B200 vs Tesla Dojo vs Google TPU — The AV and Robotics AI Training Infrastructure Benchmark

The Physical AI race is, at its core, a compute race. The companies that can run more training experiments per unit time will iterate faster, discover better driving and robotics policies faster, and ultimately deploy superior products faster. This is the lesson from large language models applied directly to physical systems: scale laws work, and the entities with more training compute win over medium-to-long time horizons. In 2026, three training compute ecosystems are competing for dominance in the Physical AI stack — NVIDIA’s GPU clusters (H100, H200, Blackwell B200), Tesla’s proprietary Dojo supercomputer, and Google’s TPU infrastructure (accessed by Waymo via Alphabet ownership). Understanding the architecture, the cost structure, and the strategic implications of each determines who wins the AV and robotics race in the back half of this decade.

Section 1 — Why Compute Is the Physical AI Battleground

Physical AI’s progress rate is compute-bound in the same way that large language model progress has been compute-bound. The more training GPU-hours an AV company can apply to its neural network, the faster the disengagement rate falls, the broader the geographic coverage the model generalizes to, and the longer the edge-case tail the model can handle correctly. Scale laws are not a hope in physical AI — they are an observed empirical regularity that every serious competitor is now building their roadmap around.

Principle	Explanation	Physical AI implication
Scale laws apply	Neural network performance improves predictably with more compute, more data, and larger models (Chinchilla scaling laws; OpenAI scaling papers)	More training compute on more data → better driving or robotics policy — same mechanism as LLMs improving with scale
Two distinct compute environments	Training compute (cluster-scale GPU/TPU, thousands of chips) vs inference compute (on-vehicle chip, runs the deployed model in real time)	Different optimization targets: training = maximize throughput and minimize cost per experiment; inference = minimize latency and power per decision
Training data volume	Tesla fleet generates est. tens of millions of miles of training data per week (est.); processing it all requires enormous training compute	Without sufficient training compute, the data flywheel slows — collected data sits unprocessed; data volume advantage is wasted
Iteration speed compounds	Faster training → more experiments per unit time → faster architecture discovery; over 2-3 years, this compounds into a substantial model quality lead	A company running 3x more training experiments per dollar discovers better policies 3x faster; compounding produces large gaps over 24-36 months
Inference latency is safety-critical	A 100ms perception delay at 60 mph = 2.7 meters of blind travel	On-vehicle inference must be fast enough to react to road hazards before the vehicle has traveled a dangerous distance

Training compute is the “lab” where AV and robotics AI models are built from data. It happens on massive GPU or TPU clusters — thousands to tens of thousands of chips — in data centers. The battle here is about cost per FLOP, cluster throughput, interconnect bandwidth, and memory capacity for large models. Physical AI models are typically large: transformer-based perception models, diffusion policies for robotics manipulation, end-to-end video-to-action neural networks. Running gradient descent on these models, across billions of training frames, requires compute at a scale that is measurable in exaFLOPS.

Inference compute is the “vehicle” — the deployed model running in real time on the physical system. For AVs, this is the chip embedded in the car that must process eight camera feeds, run the full neural network, and output steering, acceleration, and braking commands, all within milliseconds. For humanoid robots, it is the onboard compute that maps sensor inputs to motor commands at whatever cycle frequency the physical task demands. Tesla’s FSD HW4 chip (est. 350+ TOPS (est.)) and Waymo’s on-vehicle compute (NVIDIA DRIVE-based or custom, depending on generation) are the inference side of this race.

The AMD factor is real but secondary in 2026. AMD’s MI300X offers 192 GB HBM3 memory (more than H100’s 80 GB) and est. 2.6 exaFLOPS FP8 (est.) — competitive raw specs. But CUDA ecosystem lock-in makes switching costly. Virtually all AV training code is written for CUDA; AMD’s ROCm software stack is less mature. Some AV companies may use AMD clusters for cost reasons, but NVIDIA ecosystem dominance is the default for the industry in 2026.

Section 2 — NVIDIA’s AV and Robotics Training Cluster Dominance

NVIDIA supplies the training compute infrastructure for virtually every AV and robotics company that does not have proprietary training silicon. The H100, H200, and Blackwell B200 form the standard stack. NVIDIA’s Isaac simulation suite (Isaac Lab, Isaac Gym) adds GPU-accelerated physics simulation for robotics training data generation. The DRIVE platform handles on-vehicle inference for AV companies that do not build custom silicon.

NVIDIA chip	Specs	Physical AI use case	Price / availability (est.)
H100 SXM5	80 GB HBM3; est. 3.35 exaFLOPS FP8 (est.); NVLink 4.0; 700W TDP; DGX H100 = 8 H100 per system	Primary training chip for virtually all AV companies in 2024-2025; Waymo, Aurora, Figure AI, Agility, Boston Dynamics Atlas; used for training perception models, motion prediction, trajectory optimization	Est. $25K-$30K per chip (est.); DGX H100 system est. $200K-$250K (est.); cloud H100: est. $2-$3/hr per GPU (est.)
H200 SXM5	141 GB HBM3e (75% more memory than H100); same training throughput as H100 on compute-bound workloads; memory bandwidth advantage on memory-bound tasks	Large-model Physical AI training (vision-language models like Figure AI’s VLM, Tesla’s end-to-end model); higher memory capacity enables larger batch sizes without memory overflow	Est. $30K-$40K per chip (est.); successor to H100 in 2024-2025 deployment cycles
B100 / B200 (Blackwell)	B200: est. 192 GB HBM3e (est.); est. 9 exaFLOPS FP8 (est.) — nearly 3x H100; NVLink 5.0; significantly higher memory bandwidth	Next-generation AV training; Figure AI, Tesla, Aurora all likely transitioning training clusters to Blackwell in 2025-2026; 3x throughput per chip reduces training time proportionally or enables larger models at same time	Est. $35K-$45K per chip (est.); availability ramping 2025-2026; demand exceeds supply in early Blackwell era
Jetson AGX Orin (inference)	275 TOPS on-vehicle inference platform; 64 GB LPDDR5; purpose-built for robotics inference at the edge	Agility Robotics Digit uses Jetson-class compute; Boston Dynamics Atlas development; NOT used in Waymo or Tesla vehicles (both use custom silicon)	Est. $1,099 developer kit (est.); production module pricing lower at volume
NVIDIA DRIVE platform	DRIVE Orin: 254 TOPS per chip; DRIVE Thor (successor): est. 2,000 TOPS (est.)	Waymo Gen 5 reportedly uses NVIDIA hardware; multiple AV startups use DRIVE platform; the standard AV inference compute for companies that do not build custom silicon	DRIVE Orin production pricing est. $500-$2,000 per vehicle (est.); DRIVE Thor pricing est. higher
NVIDIA Isaac (robotics)	Isaac ROS: robotics middleware; Isaac Lab: reinforcement learning simulation framework; Isaac Gym: GPU-accelerated physics simulation	Figure AI, Agility Robotics, and others use NVIDIA’s Isaac simulation stack to generate synthetic training data; GPU-accelerated simulation can generate more diverse training scenarios than physical collection alone	Software: open source; hardware: standard GPU cluster for Isaac simulation workloads

At cluster scale, the economics are significant. A 10,000-GPU H100 cluster — the scale that serious AV training operations require — costs est. $250M-$300M in hardware alone (est.), before data center infrastructure, power, and cooling. The shift to Blackwell B200 (est. 9 exaFLOPS per chip vs H100’s est. 3.35 exaFLOPS) means either 3x the training throughput at the same capex, or the same throughput at roughly one-third the hardware count. This is why B200 availability and pricing are among the most strategically significant variables in the Physical AI race in 2025-2026.

Section 3 — Tesla Dojo: Custom Training Infrastructure

Tesla’s Dojo supercomputer is the most ambitious compute-infrastructure differentiation play in Physical AI. Rather than renting NVIDIA GPUs or using cloud TPUs, Tesla built a custom training chip (the D1), a custom training tile (25 D1 chips), and a custom training cabinet (the ExaPOD, 120 tiles). The strategic logic is that Tesla’s specific training workload — processing hundreds of millions of dashcam video clips from a 6M+ vehicle fleet — is specialized enough that a purpose-built chip optimized for it will outperform general-purpose GPU training on a cost-per-FLOP basis for that specific task.

Dojo dimension	Status	Strategic implication	Risk / uncertainty
D1 chip architecture	TSMC 7nm process; est. 362 TFLOPS FP32 (est.) per chip; 25 D1 chips per training tile; high-bandwidth die-to-die interconnect within a tile; designed for video-input neural network training (the primary FSD training workload)	D1’s architecture is optimized for Tesla’s specific workload: processing millions of hours of dashcam video for FSD neural network training; the intra-tile chip-to-chip interconnect bandwidth (~10 TB/s (est.)) is tuned for the gradient synchronization patterns of video training	D1’s performance advantage is workload-specific; for general LLM training, H100 or B200 are superior; for Tesla’s video-heavy workload, D1’s interconnect bandwidth may be optimized
ExaPOD and cluster scale	One training tile = 25 D1 chips; one ExaPOD cabinet = 120 training tiles = 3,000 D1 chips; multiple ExaPOD cabinets form a Dojo supercomputer cluster; Tesla targeted est. 1 exaFLOP+ of training capacity per ExaPOD (est.)	At exaFLOP scale, Dojo can process significantly more FSD training data per day vs. Tesla’s prior NVIDIA-based cluster; more training throughput enables more model iterations and faster disengagement rate improvement	Dojo’s actual deployed capacity and per-ExaPOD utilization are not publicly confirmed; Tesla has cited exaFLOP ambitions but precise operational status at scale is (est.)
Cost per FLOP vs NVIDIA	Tesla’s thesis: Dojo costs less per FLOP than renting NVIDIA H100s for the video processing workload; if D1 is est. 30-50% cheaper per FLOP than H100 for video (est.), over a 10-exaFLOP training run, Dojo saves Tesla est. $100M-$1B vs cloud NVIDIA (est.)	The cost advantage of Dojo (if real) compounds over Tesla’s lifetime: lower training cost per experiment → more iterations per budget → faster model improvement → better FSD → higher attach rate → more revenue → more training data	The cost advantage is Tesla’s thesis and has not been independently verified; NVIDIA has also reduced effective cloud pricing with H100 and B200 competition; actual Dojo cost advantage may be smaller than claimed
Training data flywheel	Tesla’s 6M+ FSD-capable vehicles generate est. tens of millions of miles of training data per week (est.); no competitor can replicate this data volume without a consumer car fleet; Dojo is the processing infrastructure for this uniquely large dataset	Waymo’s training data is smaller by one to two orders of magnitude (est. 30M+ commercial driverless miles vs est. 6B+ Tesla supervised miles (est.)); even with equal training infrastructure efficiency, Tesla’s data volume means its model has seen more scenarios	Tesla’s data is supervised (human-operated) data, not driverless data; the model learns from human driving behavior, which includes human errors as training signal; the quality difference between supervised and driverless training data is a legitimate open question
Dojo vs renting NVIDIA (strategic choice)	Tesla chose to BUILD custom training infrastructure rather than rent cloud NVIDIA GPUs; high-capital, high-risk, high-potential-reward strategy; if Dojo works as designed, Tesla saves billions in training costs over 5-10 years	The build-vs-rent decision is being watched by the industry: if Dojo succeeds, other AV companies may follow; if Dojo struggles, it validates the rent-NVIDIA approach for all competitors	Tesla has indicated continued Dojo investment; the strategy is not being abandoned regardless of near-term performance relative to NVIDIA

Section 4 — Waymo, Google TPU, and Competitor Training Infrastructure

Waymo’s compute advantage is the least publicly discussed but potentially the most durable in the AV space. As a subsidiary of Alphabet, Waymo has access to Google’s TPU infrastructure — one of the most mature and cost-efficient AI training platforms in the world — at terms that are not publicly disclosed but are almost certainly more favorable than public cloud market rates. This is a structural advantage that no AV startup can replicate without acquiring Google.

Compute approach	Who uses it	Training capability	Physical AI relevance
Google TPU v5e / v5p (Waymo)	Waymo (via Alphabet); Google’s internal AI projects; not generally available to competitors	TPU v5p: est. 459 TFLOPS BF16 per chip (est.); deployed in large Google TPU pods (thousands of chips); among the most mature and cost-efficient AI training platforms in the world	Waymo’s TPU access via Alphabet is a structural advantage over competitors that must rent public cloud GPUs; the cost is effectively subsidized by Alphabet’s infrastructure investment; Waymo accesses TPU capacity without paying public cloud market rates
NVIDIA H100 / B200 clusters (Aurora, Figure AI, Agility, et al.)	Aurora (AV trucking); Figure AI (humanoid robots, via OpenAI GPU access); Agility Robotics; Boston Dynamics; most AV startups without proprietary compute	H100: est. 3.35 exaFLOPS FP8 (est.); B200: est. 9 exaFLOPS FP8 (est.); industry-standard training platform; all Physical AI training code written for CUDA runs natively	Competitors using rented NVIDIA H100 / B200 pay market rates (est. $2-3/hr per H100 (est.)); at scale training runs, this is significant operating expense; Aurora’s AV training costs are a meaningful budget line; Figure AI’s VLM training uses OpenAI’s NVIDIA-based infrastructure via partnership
AMD MI300X (emerging)	Some data center operators; potential use by cost-conscious AV companies	MI300X: 192 GB HBM3 (2.4x H100’s 80 GB); est. 2.6 exaFLOPS FP8 (est.); price-competitive with H100 at list price	AMD MI300X is technically competitive but CUDA ecosystem lock-in limits adoption in Physical AI; all major AV training codebases are CUDA-optimized; migrating to ROCm requires significant engineering investment; AMD is gaining ground but remains a secondary choice for AV AI training in 2026
Tesla FSD chip (on-vehicle inference)	Tesla vehicles (HW3, HW4); designed in-house by Tesla’s silicon team	HW3: 144 TOPS (two chips per vehicle); HW4: est. 350+ TOPS (est.); custom architecture optimized for Tesla’s FSD neural network inference; manufactured at Samsung (HW3) and likely TSMC (HW4)	On-vehicle inference chip designed specifically for Tesla’s FSD model: runs 8 camera feeds through the neural network in real time at low latency; custom silicon enables tighter hardware-software co-design vs. using NVIDIA DRIVE platform; Tesla can optimize chip architecture for its specific neural network
Waymo custom on-vehicle silicon (Gen 6)	Waymo Gen 5 and Gen 6 vehicles	Waymo has not fully publicly disclosed on-vehicle compute details; Gen 5 Jaguar I-PACE reportedly uses NVIDIA hardware; Gen 6 purpose-built vehicle likely uses Waymo custom silicon for improved power efficiency and cost reduction	Waymo’s Gen 6 vehicle is an opportunity to optimize on-vehicle compute: custom silicon reduces cost vs. NVIDIA DRIVE licensing, reduces power consumption (critical for EV range), and enables hardware-software co-design for Waymo’s specific sensor suite and neural network architecture
Simulation infrastructure	All major Physical AI companies	NVIDIA Isaac Gym and Isaac Lab (Agility, Figure, Boston Dynamics, others); Waymo’s CarCraft simulator; Tesla’s proprietary simulation stack; Aurora’s TORCH simulator	Simulation is the force multiplier for Physical AI training data: GPU-accelerated simulation generates synthetic training scenarios orders of magnitude faster than physical data collection; Waymo’s CarCraft simulator generates billions of simulated scenarios per year, covering long-tail edge cases that physical data collection cannot reach efficiently

Section 5 — Physical AI Compute Benchmark Scorecard

Compute dimension	Waymo (Alphabet)	Tesla	Aurora	Figure AI	2028 outlook
Training infrastructure	Google TPU pods (via Alphabet) + NVIDIA GPUs; structural cost advantage from Alphabet-subsidized infrastructure; access to one of the world’s largest and most mature AI training platforms	Dojo (custom D1 chip) targeting exaFLOP scale; build-vs-rent strategic bet; if Dojo works as designed, lowest cost per FLOP for Tesla’s video-processing workload	Rented NVIDIA H100 / B200 clusters; no proprietary training infrastructure; training cost is a significant operating expense	NVIDIA GPUs via OpenAI partnership infrastructure; OpenAI provides GPU access as part of Figure-OpenAI collaboration	Tesla’s Dojo bet becomes clearer; if Dojo reaches multi-exaFLOP scale at competitive cost, Tesla has structural training cost advantage; Waymo’s Google TPU access is durable regardless of Dojo outcome
On-vehicle inference	NVIDIA DRIVE or Waymo custom silicon (Gen 6 details not fully disclosed (est.)); per-vehicle compute cost is a meaningful component of Waymo vehicle economics	Tesla FSD HW4 chip: est. 350+ TOPS (est.); in-house design; hardware-software co-design advantage; lower cost than NVIDIA DRIVE licensing at scale	Aurora Driver: NVIDIA-based compute; system designed for Class 8 truck integration; est. 3+ redundant compute units per truck (est.) for functional safety compliance	Not applicable (humanoid robot, not vehicle); Figure 02 uses custom compute; Agility Robotics Digit uses Jetson-class on-board compute	Tesla’s custom silicon path likely continues with HW5; Waymo’s Gen 6 custom silicon deployment narrows the on-vehicle compute gap vs. NVIDIA-based Gen 5
Simulation infrastructure	CarCraft: Waymo’s proprietary simulator; billions of simulated scenarios per year; among the most mature simulation stacks in AV industry	Proprietary simulation stack; Tesla uses real-world data as primary training signal with simulation as augmentation; less simulation-dependent than Waymo by design	TORCH simulator; GPU-accelerated; Aurora uses simulation for corner cases not covered in physical test miles	NVIDIA Isaac Lab + Figure’s custom robotics simulation; physics-accurate humanoid task simulation for manipulation and locomotion policies	Simulation quality increasingly critical as physical data collection alone cannot efficiently cover long-tail edge cases; all companies investing heavily in simulation fidelity and domain randomization
Data volume (training data)	Est. 30M+ commercial driverless miles (est.); highest-quality driverless data (no human driver supervision artifacts); but smallest data volume of the three major AV players	Est. 6B+ supervised miles (est.); largest training dataset in AV; data flywheel via 6M+ vehicle fleet; supervised (human-operated) data includes human driving errors as training signal	Est. 10M+ highway commercial miles (est.) since April 2025 commercial launch; high-quality highway data; limited to I-45 Dallas-Houston corridor initially	Robot task data: early stage; NVIDIA Isaac simulation fills the physical data gap; Figure AI deploys in BMW factory to generate real-world robot task data	Tesla’s data volume advantage is structural and growing; Waymo’s driverless data quality is superior for driverless model training; Aurora’s highway data is the highest-quality long-haul trucking dataset in the industry
Compute overall verdict	The Physical AI compute race is not yet decided. NVIDIA remains the dominant provider of training infrastructure for nearly all Physical AI companies — a position that generates enormous revenue and reinforces NVIDIA’s ecosystem moat. Tesla’s Dojo bet is the most ambitious compute-infrastructure differentiation in Physical AI, with a credible thesis (proprietary chip optimized for video training + largest training dataset = structural model improvement advantage) but unproven at the full scale Tesla has targeted. Waymo’s Google TPU access is the least visible but most durable structural advantage in AV training compute — Alphabet’s TPU infrastructure is among the most mature and cost-efficient in the world, and Waymo’s access to it at subsidized rates is an underappreciated competitive moat. For investors tracking the Physical AI compute race, the KPI to watch is not raw FLOP count — it is training throughput per dollar and the resulting rate of model improvement, measured by disengagement rate decline and geographic expansion rate.

Note: Figures labeled “(est.)” are directional estimates based on publicly available information as of mid-2026. Hardware pricing, cluster-scale economics, and training compute capacities are not fully publicly disclosed by the companies discussed. This article does not constitute investment advice.