2026-06-18 — views
Physical AI Compute — Waymo Google Cloud TPU vs Tesla Dojo D1: Training Infrastructure Benchmark 2026
Waymo uses Google TPU pods and 15B simulated miles daily. Tesla built Dojo D1 for video training while running NVIDIA H100 clusters in parallel as Dojo scales.
Overview
The AI training compute infrastructure is the engine behind each company’s ability to improve its autonomous driving models. Waymo, as an Alphabet subsidiary, uses Google Cloud TPUs — the same compute ecosystem that powers Gemini and other Google AI systems. Tesla built Dojo, a custom supercomputer using proprietary D1 chips designed specifically for training on video data at massive scale. This article benchmarks the two compute approaches — what each company has, what it costs, and what it means for AI model improvement pace. This is article 165 in the Physical AI Benchmark Series.
Section 1 — Waymo’s Compute Stack: Google Cloud + TPU Ecosystem
Waymo’s training infrastructure is inseparable from its position as an Alphabet subsidiary. Access to Google’s TPU pods — the world’s most advanced AI training infrastructure — is a structural advantage no independent AV startup could replicate.
| Compute dimension | Waymo detail | Strategic significance |
|---|---|---|
| Primary training infrastructure | Waymo uses Google Cloud TPUs for neural network training; as an Alphabet subsidiary, Waymo has access to Google’s internal TPU pods — the same infrastructure used to train Gemini and other Google AI systems | Being an Alphabet subsidiary gives Waymo access to the world’s most advanced AI training infrastructure at marginal cost; no AV startup could afford equivalent compute independently |
| Google TPU v4/v5 generation | Google’s TPU v4 pods delivered approximately 1 exaFLOP of compute per pod; TPU v5 (announced 2023) improved performance per watt by est. 2x or more (est.); Waymo has access to these resources as needed | TPU v5 performance represents best-in-class training throughput for transformer and convolutional architectures — the types used in AV perception and planning |
| Google DeepMind synergies | Waymo has potential access to DeepMind research talent and methodology (both are Alphabet subsidiaries); DeepMind’s work on AlphaFold, Gemini, and robotics overlaps with AV AI challenges | Cross-subsidiary knowledge transfer is not guaranteed or automatic, but organizational proximity matters; DeepMind’s robotics research is directly relevant to Waymo’s prediction and planning problems |
| Simulation compute (CarCraft) | Waymo’s CarCraft simulation system runs est. 15 billion simulated miles per day (est.) across Google Cloud; simulating rare, dangerous, and novel scenarios at this scale requires massive parallel compute | 15B simulated miles per day means Waymo can train on extremely rare edge cases (1-in-a-million scenarios) that real-world miles could never provide in sufficient volume; Google Cloud’s elastic scale makes this feasible |
| Cost structure | Waymo does not pay market rate for Google Cloud compute; as an Alphabet subsidiary, compute costs are effectively subsidized; Waymo’s training budget is not independently disclosed | This subsidy is an enormous structural advantage: an independent AV startup paying $1B or more per year for equivalent Google Cloud compute would face a capital constraint that Waymo does not |
| HD mapping compute | Waymo’s HD maps are generated and updated using Google Maps’ base data plus Waymo-specific centimeter-level lidar enrichment; processing the raw lidar point clouds into navigable HD maps requires substantial compute | Google Maps’ existing compute infrastructure for map rendering and processing is leveraged for Waymo’s HD map generation — another invisible subsidy from the Alphabet relationship |
| Compute strategy verdict | Waymo’s compute approach is depth-over-breadth: use the world’s best AI training infrastructure (Google TPUs) for a narrow, well-defined problem domain (autonomous driving perception and planning), with Google’s simulation scale for edge case coverage. The strategy works well in Waymo’s current operational envelope. The key risk: if AI architectures shift in ways that favor a different compute paradigm, Waymo is dependent on Google’s roadmap rather than its own. |
Section 2 — Tesla’s Compute Stack: Dojo D1 + NVIDIA Clusters
Tesla’s compute strategy is the opposite of Waymo’s: rather than leveraging an existing hyperscaler’s infrastructure, Tesla built its own chip and supercomputer specifically optimized for its primary training workload — video.
| Compute dimension | Tesla detail | Strategic significance |
|---|---|---|
| Dojo supercomputer architecture | Tesla designed the D1 chip (7nm, 362 TFLOPS BF16, 900 GB/s memory bandwidth per chip) specifically for video training; D1 chips tile into training nodes (25 chips per node = 9 PFLOPS), nodes into ExaPOD cabinets (120 nodes = 1.1 EFLOPS per ExaPOD), ExaPODs into the full Dojo cluster | Dojo’s architecture is optimized for Tesla’s specific training workload: large batches of video frames from millions of vehicles. The chip topology (high-bandwidth interconnects between tiles) minimizes data movement overhead for video training |
| Why Tesla built its own chip | Tesla’s primary training workload is video: billions of 8-camera video segments from 6M vehicles; existing GPU and TPU architectures were not optimally designed for this specific workload pattern; custom silicon allows Tesla to optimize for memory bandwidth, interconnect topology, and precision format for video | Custom silicon development costs hundreds of millions of dollars and takes 3–5 years; Tesla’s justification is that training cost savings over a 5–10 year horizon exceed the development cost — the same logic Apple applied to M-series chips |
| Dojo vs. NVIDIA GPU clusters | Tesla also uses NVIDIA H100 clusters for training (Dojo supplements, does not fully replace NVIDIA); NVIDIA H100 delivers approximately 2,000 TFLOPS BF16 per GPU; a 10,000-GPU H100 cluster = 20 EFLOPS; Tesla’s combined Dojo and NVIDIA compute is est. among the largest single-company AI compute deployments outside of hyperscalers (est.) | Tesla’s dual-track strategy (Dojo for video-optimized training plus NVIDIA for general AI) reflects pragmatism: H100s are available now; Dojo ramps over time. Running both allows Tesla to continuously improve FSD without waiting for Dojo to mature |
| Training data pipeline | Tesla’s primary compute advantage is data, not chips: 6M vehicles multiplied by average 1 hour per day FSD engaged multiplied by 8 cameras = enormous daily video volume; labeling is automated via the Data Engine (shadow mode: FSD makes a decision, a human corrects it, the correction becomes labeled training data) | The Data Engine’s compute requirement is itself massive: running shadow mode inference on millions of vehicles and processing the corrections requires significant inference and storage infrastructure, not just training compute |
| Dojo deployment timeline | First Dojo ExaPOD operational in 2022 (Texas Gigafactory); Musk targeted 100 EFLOPS by late 2024 (est.); actual deployment pace not fully disclosed; Tesla’s subsequent investment in NVIDIA H100 clusters suggests Dojo ramp was slower than planned (est.) | Slower-than-planned Dojo ramp is consistent with custom silicon’s typical timeline overruns; this is not a failure — it is the normal trajectory of a first-generation custom chip. NVIDIA H100 fills the gap until Dojo v2 (next-gen) |
| Dojo v2 and future compute | Tesla has referenced a next-generation Dojo chip; details not disclosed as of mid-2026 (est.); if Dojo v2 follows the typical 2x performance-per-generation improvement, Tesla’s training compute could reach hundreds of EFLOPS by 2027 (est.) | The trajectory matters more than current capacity: if Dojo v2 delivers and Tesla’s training compute reaches hyperscaler scale, Tesla would be the only non-hyperscaler with proprietary AI training silicon at that level |
| Compute strategy verdict | Tesla’s compute approach is build-vs-buy at maximum ambition: build a custom chip and supercomputer optimized for your specific training workload, while renting NVIDIA in the interim. The strategy is high-risk (custom silicon often underdelivers), high-reward (if Dojo works as designed, Tesla’s training cost per FSD improvement drops dramatically). The key risk: Dojo D1 may not achieve the performance and yield targets that justify the development cost relative to continued NVIDIA dependence. |
Section 3 — Head-to-Head Compute Comparison
| Dimension | Waymo / Google TPU | Tesla Dojo + NVIDIA | Edge |
|---|---|---|---|
| Training compute scale (est.) | Access to Google’s full TPU fleet — potentially hundreds of EFLOPS (est.); shared with all Google AI projects | Tesla combined Dojo and NVIDIA est. tens of EFLOPS (est.); dedicated to Tesla AI workloads only | Waymo has access to more total compute; Tesla has more dedicated compute |
| Compute cost structure | Effectively subsidized (Alphabet subsidiary); no market-rate payments for Google TPU | Mixed: Dojo capex amortized over training lifetime; NVIDIA H100 rented/purchased at market rates; significant but finite | Waymo decisive on compute cost per training run at current scale |
| Chip customization for AV | TPUs optimized for Google’s workloads (not AV-specific); flexible but not specialized | Dojo D1 designed specifically for video training at AV scale | Tesla decisive on silicon fit-for-purpose; Waymo uses general-purpose AI chips |
| Data volume for training | Approximately 30M driverless commercial miles (est.); high-purity (fully driverless = clean labels) but lower volume | Approximately 6B supervised FSD miles (est.); lower label purity (human-supervised) but massive volume | Tesla decisive on data volume; Waymo decisive on data purity |
| Simulation scale | 15B simulated miles per day (est.) via CarCraft on Google Cloud | Growing simulation capability via Dojo; scale not disclosed (est.) | Waymo decisive on simulation at current scale |
| Control over compute roadmap | Dependent on Google’s TPU roadmap (TPU v5 to v6 etc.); no independent chip design | Tesla controls its own chip roadmap; can optimize D1 to D2 for AV-specific needs | Tesla decisive on compute sovereignty and roadmap control |
| Overall compute verdict | Waymo’s Google Cloud / TPU advantage is structural today: more total compute, lower effective cost, best-in-class TPU performance, unmatched simulation scale. Tesla’s Dojo advantage is strategic over time: dedicated silicon optimized for the specific video training workload, independent roadmap, no sharing with other Alphabet AI projects. The 2028 question is whether Dojo v2 delivers on its performance promise. |
Section 4 — What Compute Determines in the AV Race
| AI capability | How compute determines it | Waymo advantage | Tesla advantage |
|---|---|---|---|
| Perception accuracy | Better training data and more compute lead to lower detection error rate; perception models must train on billions of labeled frames | Driverless label purity: no human-supervised noise in training data | 6B miles of video data; volume enables rare-case coverage |
| Prediction (other agents) | Modeling human behavior requires training on diverse real-world scenarios; simulation fills gaps that real-world data cannot cover | 15B simulated miles per day covers edge cases systematically | Scale of real-world data provides behavioral diversity that simulation approximates |
| Planning (what to do) | Planning policy training requires simulation at scale to test edge cases safely; real-world testing is too dangerous and expensive for rare scenarios | Google Cloud simulation scale is decisive for planning policy improvement | End-to-end FSD v12 collapses perception and planning into one network — reduces the compute problem from two steps to one |
| Generalization (new cities) | Generalizing to new cities requires either: (a) training on data from that city, or (b) compute-intensive simulation of that city’s scenarios | HD map and simulation approach means Waymo must generate maps and simulate each new city before commercial launch | Tesla’s mapless FSD approach means no city-specific simulation required; the model generalizes from the training distribution |
| Model iteration speed | Faster training compute leads to more experiments per week and faster model improvement | More total TPU access means more simultaneous experiments possible | Dedicated Dojo compute means no contention with other Google AI projects |
Section 5 — Compute Benchmark Scorecard
| Dimension | Waymo / Google | Tesla Dojo + NVIDIA | Edge | 2028 outlook |
|---|---|---|---|---|
| Total training compute access | Decisive — Google TPU fleet is among the largest AI compute deployments on Earth | Large but not at Google scale | Waymo (current) | Tesla closes gap as Dojo scales |
| Compute cost efficiency | Decisive — effectively subsidized as Alphabet subsidiary | Market-rate NVIDIA plus Dojo capex | Waymo (current) | Depends on Dojo D2 delivery |
| Silicon fit for AV workload | General-purpose TPU (flexible but not AV-optimized) | Dojo D1 designed for video training (AV-optimized) | Tesla | Tesla’s purpose-built silicon is a long-term advantage if it delivers |
| Compute roadmap control | Dependent on Google TPU roadmap | Independent Dojo roadmap | Tesla | Tesla’s control over silicon roadmap is a strategic asset |
| Simulation scale | Decisive — 15B simulated miles per day (est.) | Growing; scale not disclosed (est.) | Waymo (current) | Both scale; Waymo head start significant |
| Training data quality × volume | Higher purity (driverless), lower volume | Lower purity (supervised), much higher volume | Depends on use case | Volume advantage compounds as Tesla fleet grows |
| Overall verdict | Waymo has the superior compute infrastructure today by most metrics: more total TPU access, lower effective cost, and the world’s best simulation scale. Tesla’s bet is that Dojo — purpose-built for video training — will eventually deliver lower cost per training run than general-purpose TPUs, and that data volume (6M vehicles) will more than compensate for lower label purity. The 2028 compute race is Dojo v2 vs TPU v6: which chip roadmap better serves the specific demands of training a generalist AV policy at scale. |
All figures labeled (est.) are derived from public company disclosures, analyst estimates, and industry benchmarks. This article is part of the Physical AI Benchmark Series — article 165.
Sources
- Tesla Dojo D1 chip architecture — Tesla AI Day 2021 ↗
- Google TPU v5 announcement — Google Cloud ↗
- Waymo CarCraft simulation — Waymo research blog ↗
- Tesla FSD training data pipeline — Tesla AI Day 2022 ↗
- Google Alphabet AI infrastructure — Alphabet earnings ↗