2026-06-18 — views
Physical AI Compute Race 2026 — NVIDIA B200 vs Tesla Dojo vs Google TPU: The AV and Robotics Training Infrastructure Benchmark
NVIDIA B200 est. 9 exaFLOPS powers virtually all AV AI training. Tesla Dojo bets on custom silicon. Waymo uses Google TPU. Compute decides the race.
Article 205 in the Physical AI Benchmark Series — Physical AI Compute Race 2026: NVIDIA H100/B200 vs Tesla Dojo vs Google TPU — The AV and Robotics AI Training Infrastructure Benchmark
The Physical AI race is, at its core, a compute race. The companies that can run more training experiments per unit time will iterate faster, discover better driving and robotics policies faster, and ultimately deploy superior products faster. This is the lesson from large language models applied directly to physical systems: scale laws work, and the entities with more training compute win over medium-to-long time horizons. In 2026, three training compute ecosystems are competing for dominance in the Physical AI stack — NVIDIA’s GPU clusters (H100, H200, Blackwell B200), Tesla’s proprietary Dojo supercomputer, and Google’s TPU infrastructure (accessed by Waymo via Alphabet ownership). Understanding the architecture, the cost structure, and the strategic implications of each determines who wins the AV and robotics race in the back half of this decade.
Section 1 — Why Compute Is the Physical AI Battleground
Physical AI’s progress rate is compute-bound in the same way that large language model progress has been compute-bound. The more training GPU-hours an AV company can apply to its neural network, the faster the disengagement rate falls, the broader the geographic coverage the model generalizes to, and the longer the edge-case tail the model can handle correctly. Scale laws are not a hope in physical AI — they are an observed empirical regularity that every serious competitor is now building their roadmap around.
| Principle | Explanation | Physical AI implication |
|---|---|---|
| Scale laws apply | Neural network performance improves predictably with more compute, more data, and larger models (Chinchilla scaling laws; OpenAI scaling papers) | More training compute on more data → better driving or robotics policy — same mechanism as LLMs improving with scale |
| Two distinct compute environments | Training compute (cluster-scale GPU/TPU, thousands of chips) vs inference compute (on-vehicle chip, runs the deployed model in real time) | Different optimization targets: training = maximize throughput and minimize cost per experiment; inference = minimize latency and power per decision |
| Training data volume | Tesla fleet generates est. tens of millions of miles of training data per week (est.); processing it all requires enormous training compute | Without sufficient training compute, the data flywheel slows — collected data sits unprocessed; data volume advantage is wasted |
| Iteration speed compounds | Faster training → more experiments per unit time → faster architecture discovery; over 2-3 years, this compounds into a substantial model quality lead | A company running 3x more training experiments per dollar discovers better policies 3x faster; compounding produces large gaps over 24-36 months |
| Inference latency is safety-critical | A 100ms perception delay at 60 mph = 2.7 meters of blind travel | On-vehicle inference must be fast enough to react to road hazards before the vehicle has traveled a dangerous distance |
Training compute is the “lab” where AV and robotics AI models are built from data. It happens on massive GPU or TPU clusters — thousands to tens of thousands of chips — in data centers. The battle here is about cost per FLOP, cluster throughput, interconnect bandwidth, and memory capacity for large models. Physical AI models are typically large: transformer-based perception models, diffusion policies for robotics manipulation, end-to-end video-to-action neural networks. Running gradient descent on these models, across billions of training frames, requires compute at a scale that is measurable in exaFLOPS.
Inference compute is the “vehicle” — the deployed model running in real time on the physical system. For AVs, this is the chip embedded in the car that must process eight camera feeds, run the full neural network, and output steering, acceleration, and braking commands, all within milliseconds. For humanoid robots, it is the onboard compute that maps sensor inputs to motor commands at whatever cycle frequency the physical task demands. Tesla’s FSD HW4 chip (est. 350+ TOPS (est.)) and Waymo’s on-vehicle compute (NVIDIA DRIVE-based or custom, depending on generation) are the inference side of this race.
The AMD factor is real but secondary in 2026. AMD’s MI300X offers 192 GB HBM3 memory (more than H100’s 80 GB) and est. 2.6 exaFLOPS FP8 (est.) — competitive raw specs. But CUDA ecosystem lock-in makes switching costly. Virtually all AV training code is written for CUDA; AMD’s ROCm software stack is less mature. Some AV companies may use AMD clusters for cost reasons, but NVIDIA ecosystem dominance is the default for the industry in 2026.
Section 2 — NVIDIA’s AV and Robotics Training Cluster Dominance
NVIDIA supplies the training compute infrastructure for virtually every AV and robotics company that does not have proprietary training silicon. The H100, H200, and Blackwell B200 form the standard stack. NVIDIA’s Isaac simulation suite (Isaac Lab, Isaac Gym) adds GPU-accelerated physics simulation for robotics training data generation. The DRIVE platform handles on-vehicle inference for AV companies that do not build custom silicon.
| NVIDIA chip | Specs | Physical AI use case | Price / availability (est.) |
|---|---|---|---|
| H100 SXM5 | 80 GB HBM3; est. 3.35 exaFLOPS FP8 (est.); NVLink 4.0; 700W TDP; DGX H100 = 8 H100 per system | Primary training chip for virtually all AV companies in 2024-2025; Waymo, Aurora, Figure AI, Agility, Boston Dynamics Atlas; used for training perception models, motion prediction, trajectory optimization | Est. $25K-$30K per chip (est.); DGX H100 system est. $200K-$250K (est.); cloud H100: est. $2-$3/hr per GPU (est.) |
| H200 SXM5 | 141 GB HBM3e (75% more memory than H100); same training throughput as H100 on compute-bound workloads; memory bandwidth advantage on memory-bound tasks | Large-model Physical AI training (vision-language models like Figure AI’s VLM, Tesla’s end-to-end model); higher memory capacity enables larger batch sizes without memory overflow | Est. $30K-$40K per chip (est.); successor to H100 in 2024-2025 deployment cycles |
| B100 / B200 (Blackwell) | B200: est. 192 GB HBM3e (est.); est. 9 exaFLOPS FP8 (est.) — nearly 3x H100; NVLink 5.0; significantly higher memory bandwidth | Next-generation AV training; Figure AI, Tesla, Aurora all likely transitioning training clusters to Blackwell in 2025-2026; 3x throughput per chip reduces training time proportionally or enables larger models at same time | Est. $35K-$45K per chip (est.); availability ramping 2025-2026; demand exceeds supply in early Blackwell era |
| Jetson AGX Orin (inference) | 275 TOPS on-vehicle inference platform; 64 GB LPDDR5; purpose-built for robotics inference at the edge | Agility Robotics Digit uses Jetson-class compute; Boston Dynamics Atlas development; NOT used in Waymo or Tesla vehicles (both use custom silicon) | Est. $1,099 developer kit (est.); production module pricing lower at volume |
| NVIDIA DRIVE platform | DRIVE Orin: 254 TOPS per chip; DRIVE Thor (successor): est. 2,000 TOPS (est.) | Waymo Gen 5 reportedly uses NVIDIA hardware; multiple AV startups use DRIVE platform; the standard AV inference compute for companies that do not build custom silicon | DRIVE Orin production pricing est. $500-$2,000 per vehicle (est.); DRIVE Thor pricing est. higher |
| NVIDIA Isaac (robotics) | Isaac ROS: robotics middleware; Isaac Lab: reinforcement learning simulation framework; Isaac Gym: GPU-accelerated physics simulation | Figure AI, Agility Robotics, and others use NVIDIA’s Isaac simulation stack to generate synthetic training data; GPU-accelerated simulation can generate more diverse training scenarios than physical collection alone | Software: open source; hardware: standard GPU cluster for Isaac simulation workloads |
At cluster scale, the economics are significant. A 10,000-GPU H100 cluster — the scale that serious AV training operations require — costs est. $250M-$300M in hardware alone (est.), before data center infrastructure, power, and cooling. The shift to Blackwell B200 (est. 9 exaFLOPS per chip vs H100’s est. 3.35 exaFLOPS) means either 3x the training throughput at the same capex, or the same throughput at roughly one-third the hardware count. This is why B200 availability and pricing are among the most strategically significant variables in the Physical AI race in 2025-2026.
Section 3 — Tesla Dojo: Custom Training Infrastructure
Tesla’s Dojo supercomputer is the most ambitious compute-infrastructure differentiation play in Physical AI. Rather than renting NVIDIA GPUs or using cloud TPUs, Tesla built a custom training chip (the D1), a custom training tile (25 D1 chips), and a custom training cabinet (the ExaPOD, 120 tiles). The strategic logic is that Tesla’s specific training workload — processing hundreds of millions of dashcam video clips from a 6M+ vehicle fleet — is specialized enough that a purpose-built chip optimized for it will outperform general-purpose GPU training on a cost-per-FLOP basis for that specific task.
| Dojo dimension | Status | Strategic implication | Risk / uncertainty |
|---|---|---|---|
| D1 chip architecture | TSMC 7nm process; est. 362 TFLOPS FP32 (est.) per chip; 25 D1 chips per training tile; high-bandwidth die-to-die interconnect within a tile; designed for video-input neural network training (the primary FSD training workload) | D1’s architecture is optimized for Tesla’s specific workload: processing millions of hours of dashcam video for FSD neural network training; the intra-tile chip-to-chip interconnect bandwidth (~10 TB/s (est.)) is tuned for the gradient synchronization patterns of video training | D1’s performance advantage is workload-specific; for general LLM training, H100 or B200 are superior; for Tesla’s video-heavy workload, D1’s interconnect bandwidth may be optimized |
| ExaPOD and cluster scale | One training tile = 25 D1 chips; one ExaPOD cabinet = 120 training tiles = 3,000 D1 chips; multiple ExaPOD cabinets form a Dojo supercomputer cluster; Tesla targeted est. 1 exaFLOP+ of training capacity per ExaPOD (est.) | At exaFLOP scale, Dojo can process significantly more FSD training data per day vs. Tesla’s prior NVIDIA-based cluster; more training throughput enables more model iterations and faster disengagement rate improvement | Dojo’s actual deployed capacity and per-ExaPOD utilization are not publicly confirmed; Tesla has cited exaFLOP ambitions but precise operational status at scale is (est.) |
| Cost per FLOP vs NVIDIA | Tesla’s thesis: Dojo costs less per FLOP than renting NVIDIA H100s for the video processing workload; if D1 is est. 30-50% cheaper per FLOP than H100 for video (est.), over a 10-exaFLOP training run, Dojo saves Tesla est. $100M-$1B vs cloud NVIDIA (est.) | The cost advantage of Dojo (if real) compounds over Tesla’s lifetime: lower training cost per experiment → more iterations per budget → faster model improvement → better FSD → higher attach rate → more revenue → more training data | The cost advantage is Tesla’s thesis and has not been independently verified; NVIDIA has also reduced effective cloud pricing with H100 and B200 competition; actual Dojo cost advantage may be smaller than claimed |
| Training data flywheel | Tesla’s 6M+ FSD-capable vehicles generate est. tens of millions of miles of training data per week (est.); no competitor can replicate this data volume without a consumer car fleet; Dojo is the processing infrastructure for this uniquely large dataset | Waymo’s training data is smaller by one to two orders of magnitude (est. 30M+ commercial driverless miles vs est. 6B+ Tesla supervised miles (est.)); even with equal training infrastructure efficiency, Tesla’s data volume means its model has seen more scenarios | Tesla’s data is supervised (human-operated) data, not driverless data; the model learns from human driving behavior, which includes human errors as training signal; the quality difference between supervised and driverless training data is a legitimate open question |
| Dojo vs renting NVIDIA (strategic choice) | Tesla chose to BUILD custom training infrastructure rather than rent cloud NVIDIA GPUs; high-capital, high-risk, high-potential-reward strategy; if Dojo works as designed, Tesla saves billions in training costs over 5-10 years | The build-vs-rent decision is being watched by the industry: if Dojo succeeds, other AV companies may follow; if Dojo struggles, it validates the rent-NVIDIA approach for all competitors | Tesla has indicated continued Dojo investment; the strategy is not being abandoned regardless of near-term performance relative to NVIDIA |
Section 4 — Waymo, Google TPU, and Competitor Training Infrastructure
Waymo’s compute advantage is the least publicly discussed but potentially the most durable in the AV space. As a subsidiary of Alphabet, Waymo has access to Google’s TPU infrastructure — one of the most mature and cost-efficient AI training platforms in the world — at terms that are not publicly disclosed but are almost certainly more favorable than public cloud market rates. This is a structural advantage that no AV startup can replicate without acquiring Google.
| Compute approach | Who uses it | Training capability | Physical AI relevance |
|---|---|---|---|
| Google TPU v5e / v5p (Waymo) | Waymo (via Alphabet); Google’s internal AI projects; not generally available to competitors | TPU v5p: est. 459 TFLOPS BF16 per chip (est.); deployed in large Google TPU pods (thousands of chips); among the most mature and cost-efficient AI training platforms in the world | Waymo’s TPU access via Alphabet is a structural advantage over competitors that must rent public cloud GPUs; the cost is effectively subsidized by Alphabet’s infrastructure investment; Waymo accesses TPU capacity without paying public cloud market rates |
| NVIDIA H100 / B200 clusters (Aurora, Figure AI, Agility, et al.) | Aurora (AV trucking); Figure AI (humanoid robots, via OpenAI GPU access); Agility Robotics; Boston Dynamics; most AV startups without proprietary compute | H100: est. 3.35 exaFLOPS FP8 (est.); B200: est. 9 exaFLOPS FP8 (est.); industry-standard training platform; all Physical AI training code written for CUDA runs natively | Competitors using rented NVIDIA H100 / B200 pay market rates (est. $2-3/hr per H100 (est.)); at scale training runs, this is significant operating expense; Aurora’s AV training costs are a meaningful budget line; Figure AI’s VLM training uses OpenAI’s NVIDIA-based infrastructure via partnership |
| AMD MI300X (emerging) | Some data center operators; potential use by cost-conscious AV companies | MI300X: 192 GB HBM3 (2.4x H100’s 80 GB); est. 2.6 exaFLOPS FP8 (est.); price-competitive with H100 at list price | AMD MI300X is technically competitive but CUDA ecosystem lock-in limits adoption in Physical AI; all major AV training codebases are CUDA-optimized; migrating to ROCm requires significant engineering investment; AMD is gaining ground but remains a secondary choice for AV AI training in 2026 |
| Tesla FSD chip (on-vehicle inference) | Tesla vehicles (HW3, HW4); designed in-house by Tesla’s silicon team | HW3: 144 TOPS (two chips per vehicle); HW4: est. 350+ TOPS (est.); custom architecture optimized for Tesla’s FSD neural network inference; manufactured at Samsung (HW3) and likely TSMC (HW4) | On-vehicle inference chip designed specifically for Tesla’s FSD model: runs 8 camera feeds through the neural network in real time at low latency; custom silicon enables tighter hardware-software co-design vs. using NVIDIA DRIVE platform; Tesla can optimize chip architecture for its specific neural network |
| Waymo custom on-vehicle silicon (Gen 6) | Waymo Gen 5 and Gen 6 vehicles | Waymo has not fully publicly disclosed on-vehicle compute details; Gen 5 Jaguar I-PACE reportedly uses NVIDIA hardware; Gen 6 purpose-built vehicle likely uses Waymo custom silicon for improved power efficiency and cost reduction | Waymo’s Gen 6 vehicle is an opportunity to optimize on-vehicle compute: custom silicon reduces cost vs. NVIDIA DRIVE licensing, reduces power consumption (critical for EV range), and enables hardware-software co-design for Waymo’s specific sensor suite and neural network architecture |
| Simulation infrastructure | All major Physical AI companies | NVIDIA Isaac Gym and Isaac Lab (Agility, Figure, Boston Dynamics, others); Waymo’s CarCraft simulator; Tesla’s proprietary simulation stack; Aurora’s TORCH simulator | Simulation is the force multiplier for Physical AI training data: GPU-accelerated simulation generates synthetic training scenarios orders of magnitude faster than physical data collection; Waymo’s CarCraft simulator generates billions of simulated scenarios per year, covering long-tail edge cases that physical data collection cannot reach efficiently |
Section 5 — Physical AI Compute Benchmark Scorecard
| Compute dimension | Waymo (Alphabet) | Tesla | Aurora | Figure AI | 2028 outlook |
|---|---|---|---|---|---|
| Training infrastructure | Google TPU pods (via Alphabet) + NVIDIA GPUs; structural cost advantage from Alphabet-subsidized infrastructure; access to one of the world’s largest and most mature AI training platforms | Dojo (custom D1 chip) targeting exaFLOP scale; build-vs-rent strategic bet; if Dojo works as designed, lowest cost per FLOP for Tesla’s video-processing workload | Rented NVIDIA H100 / B200 clusters; no proprietary training infrastructure; training cost is a significant operating expense | NVIDIA GPUs via OpenAI partnership infrastructure; OpenAI provides GPU access as part of Figure-OpenAI collaboration | Tesla’s Dojo bet becomes clearer; if Dojo reaches multi-exaFLOP scale at competitive cost, Tesla has structural training cost advantage; Waymo’s Google TPU access is durable regardless of Dojo outcome |
| On-vehicle inference | NVIDIA DRIVE or Waymo custom silicon (Gen 6 details not fully disclosed (est.)); per-vehicle compute cost is a meaningful component of Waymo vehicle economics | Tesla FSD HW4 chip: est. 350+ TOPS (est.); in-house design; hardware-software co-design advantage; lower cost than NVIDIA DRIVE licensing at scale | Aurora Driver: NVIDIA-based compute; system designed for Class 8 truck integration; est. 3+ redundant compute units per truck (est.) for functional safety compliance | Not applicable (humanoid robot, not vehicle); Figure 02 uses custom compute; Agility Robotics Digit uses Jetson-class on-board compute | Tesla’s custom silicon path likely continues with HW5; Waymo’s Gen 6 custom silicon deployment narrows the on-vehicle compute gap vs. NVIDIA-based Gen 5 |
| Simulation infrastructure | CarCraft: Waymo’s proprietary simulator; billions of simulated scenarios per year; among the most mature simulation stacks in AV industry | Proprietary simulation stack; Tesla uses real-world data as primary training signal with simulation as augmentation; less simulation-dependent than Waymo by design | TORCH simulator; GPU-accelerated; Aurora uses simulation for corner cases not covered in physical test miles | NVIDIA Isaac Lab + Figure’s custom robotics simulation; physics-accurate humanoid task simulation for manipulation and locomotion policies | Simulation quality increasingly critical as physical data collection alone cannot efficiently cover long-tail edge cases; all companies investing heavily in simulation fidelity and domain randomization |
| Data volume (training data) | Est. 30M+ commercial driverless miles (est.); highest-quality driverless data (no human driver supervision artifacts); but smallest data volume of the three major AV players | Est. 6B+ supervised miles (est.); largest training dataset in AV; data flywheel via 6M+ vehicle fleet; supervised (human-operated) data includes human driving errors as training signal | Est. 10M+ highway commercial miles (est.) since April 2025 commercial launch; high-quality highway data; limited to I-45 Dallas-Houston corridor initially | Robot task data: early stage; NVIDIA Isaac simulation fills the physical data gap; Figure AI deploys in BMW factory to generate real-world robot task data | Tesla’s data volume advantage is structural and growing; Waymo’s driverless data quality is superior for driverless model training; Aurora’s highway data is the highest-quality long-haul trucking dataset in the industry |
| Compute overall verdict | The Physical AI compute race is not yet decided. NVIDIA remains the dominant provider of training infrastructure for nearly all Physical AI companies — a position that generates enormous revenue and reinforces NVIDIA’s ecosystem moat. Tesla’s Dojo bet is the most ambitious compute-infrastructure differentiation in Physical AI, with a credible thesis (proprietary chip optimized for video training + largest training dataset = structural model improvement advantage) but unproven at the full scale Tesla has targeted. Waymo’s Google TPU access is the least visible but most durable structural advantage in AV training compute — Alphabet’s TPU infrastructure is among the most mature and cost-efficient in the world, and Waymo’s access to it at subsidized rates is an underappreciated competitive moat. For investors tracking the Physical AI compute race, the KPI to watch is not raw FLOP count — it is training throughput per dollar and the resulting rate of model improvement, measured by disengagement rate decline and geographic expansion rate. |
Note: Figures labeled “(est.)” are directional estimates based on publicly available information as of mid-2026. Hardware pricing, cluster-scale economics, and training compute capacities are not fully publicly disclosed by the companies discussed. This article does not constitute investment advice.
Sources
- NVIDIA H100 and B200 architecture specs — NVIDIA ↗
- Tesla Dojo AI training infrastructure — Tesla AI ↗
- Google TPU v5 cloud pricing — Google Cloud ↗
- AMD MI300X architecture — AMD ↗