Skip to content
AI-Daily-Builder

2026-06-18 views

Physical AI Compute 2026 — Waymo Google TPU Access vs Tesla Dojo D1 and FSD Chip: The AI Training Infrastructure Benchmark

Waymo trains on Google TPU clusters. Tesla has Dojo D1 plus 6M vehicle fleet data. The training compute gap is Physical AI's hidden rate-limiter.

Article 182 in the Physical AI Benchmark Series — AI Training and Inference Infrastructure

The race to build the world’s best autonomous driving system is also a race for AI compute. Training compute determines how fast each company can improve its models. Inference compute determines whether those improved models can run in a vehicle in real time. Both dimensions matter — and both are rarely analyzed together with the rigor applied to sensor hardware, safety miles, or regulatory approvals. This article benchmarks the AI training and inference infrastructure of Waymo and Tesla as a core Physical AI competitive variable.


Section 1 — Why AI Training Compute is a Physical AI Rate-Limiter

Autonomous driving is, at its foundation, a machine learning problem. The quality of an autonomous driving system is bounded by two things: the quality of its training data, and the compute available to train on that data. More compute enables larger models, more experiments, faster iteration cycles, and better generalization. The same dataset trained on ten times more compute reliably produces a better model. Training compute is the “rate of improvement” lever — the variable that determines how fast the quality ceiling rises.

Two critical compute dimensions shape the physical AI race:

Training compute — used to improve the model offline, at headquarters. This is where Google TPUs and Tesla Dojo matter. Training compute is the behind-the-scenes accelerator that does not affect today’s vehicle but determines how good next quarter’s software update will be.

Inference compute — the chip inside the vehicle that runs the model in real time. This is where Tesla’s FSD HW3/HW4 silicon and Waymo’s in-vehicle compute matter. Inference compute determines what the current vehicle can execute safely today.

Training and inference are separate concerns. A company can have world-class training compute but limited in-vehicle inference capability — or vice versa. The compounding advantage comes from excelling at both.

NVIDIA dominance as the baseline: Most AV companies — Zoox, Aurora, Mobileye, Cruise — train their models on NVIDIA GPU clusters (A100, H100, H200). This is the commodity baseline. The interesting competitive differentiation comes from companies that depart from this baseline in one of two directions: (1) access to proprietary non-NVIDIA training silicon (Waymo via Google TPUs, Alphabet being the parent), or (2) investment in custom training silicon (Tesla via Dojo D1).

Vertical integration rarity: Tesla is one of a very small number of companies building both its own training silicon (Dojo D1) and its own inference silicon (FSD chip, fabbed by TSMC). This vertical integration is costly and technically demanding, but it provides strategic independence from NVIDIA supply constraints and cost structures. No other AV company approaches Tesla’s level of silicon vertical integration.

The data flywheel and the compute flywheel interact: more data enables better training; more compute enables faster training on that data; better models enable better autonomous driving, which generates more and higher-quality data. The two flywheels compound. Understanding which company has the stronger position on each flywheel — and which company has the structural advantage in the interaction between them — is the central analytical question this article addresses.


Section 2 — Waymo’s Compute Advantage: Google TPU Access

Waymo’s structural advantage in training compute derives entirely from one fact: it is an Alphabet subsidiary. Alphabet has built one of the largest custom AI compute deployments on Earth, and Waymo trains its models on that infrastructure at internal transfer pricing.

DimensionDetail
Training infrastructureWaymo trains its models on Google’s TPU (Tensor Processing Unit) clusters. Google has one of the largest TPU deployments in the world. As an Alphabet subsidiary, Waymo receives priority access to this compute at internal transfer pricing — well below commercial GPU cluster rental rates.
Google TPU v4 specificationsGoogle TPU v4 delivers est. 275 TFLOPS (BF16) per chip. TPU v5e delivers est. 197 TFLOPS per chip but with significantly improved memory bandwidth and interconnect architecture. Google runs TPU pods with hundreds to thousands of chips interconnected via high-bandwidth fabric.
Effective training capacityWaymo’s effective training compute capacity — drawn from Alphabet’s infrastructure — likely exceeds that of any standalone AV startup and most AV subsidiaries. Only Zoox (Amazon AWS) and Waymo (Google TPU) have cloud-parent compute advantages at this tier.
Cost of compute accessWaymo pays Alphabet for compute at internal transfer pricing. This is an ongoing opex cost but is estimated to be significantly below market rate for equivalent GPU cluster access. The exact figures are not publicly disclosed.
Training data pipelineWaymo’s training data comes from its driverless commercial fleet (sensor data from commercial rides), HD mapping data, and Carcraft simulation. LIDAR plus camera plus radar produces multimodal training data with richer per-frame signal than camera-only approaches.
Carcraft simulationWaymo uses its Carcraft simulation platform to generate synthetic training scenarios at scale. Carcraft has been described as running millions of simulated miles per day. Simulation plus real-world data constitutes Waymo’s combined training dataset. All simulation output figures are estimates from public statements.
Comparison to standalone AV companiesWaymo’s Google TPU access is a structural compute advantage over AV companies that must purchase or rent NVIDIA GPU clusters on the open market. Aurora, Mobileye, and Zoox (which has AWS) all operate in the GPU cluster tier. Waymo operates in the TPU pod tier via parental access.
Waymo’s critical compute constraintDespite Google TPU access, Waymo’s training data volume is limited by fleet size — estimated at approximately 2,500 vehicles as of mid-2026. Tesla’s 6M-plus FSD-capable vehicle fleet generates orders of magnitude more training data. Compute cannot compensate for a data deficit of this magnitude.

The structural tension in Waymo’s position: Waymo has better training compute access than almost any competitor — but far less training data than Tesla. The question is whether more compute per data point can substitute for fewer total data points. In machine learning at scale, the empirical answer is generally no: data quantity at sufficient quality almost always dominates compute-per-sample efficiency gains beyond a threshold.


Section 3 — Tesla Dojo: Custom Training Silicon at Scale

Tesla’s strategic bet on Dojo D1 is one of the most ambitious custom silicon projects in the technology industry. Building a custom AI supercomputer from scratch — designing the chip, the interconnect, the cooling, the software stack, and the training framework — required a multi-year commitment of capital and engineering talent that almost no other company has attempted.

DimensionDetail
What Dojo isDojo is Tesla’s custom AI supercomputer built specifically for video training — the dominant training modality for FSD. Tesla’s fleet generates billions of miles of camera footage. Processing this data efficiently at scale requires hardware optimized specifically for video workloads, not general-purpose ML. Dojo is that hardware.
D1 chip specifications (est.)Tesla D1 chip: est. 362 TFLOPS (BF16) per chip. Designed for high-bandwidth inter-chip connectivity via a custom die-to-die interface. 25 D1 chips tile together into a “training tile.” Tiles connect into ExaPOD cabinets. The architecture is designed to minimize data movement cost between chips — the dominant cost in large-scale video training. All figures are estimates from Tesla AI Day 2022 disclosures.
Dojo vs. NVIDIA H100NVIDIA H100: est. 989 TFLOPS (BF16) per chip — approximately 2.7x the per-chip throughput of D1. However, D1 is designed for high-bandwidth tiling at lower cost-per-FLOP at scale, optimized specifically for the video training workloads Tesla runs rather than general ML. At sufficiently large scale, Dojo’s architecture may offer better cost efficiency for Tesla’s specific workload profile.
Dojo scale (est.)Tesla began ramping Dojo capacity in 2023–2024. Target: multi-exaFLOP cluster by 2025–2026 (est.). Exact current deployed capacity has not been publicly disclosed. Elon Musk has cited aggressive Dojo scaling targets at multiple shareholder and product events.
Why Tesla built DojoThree motivations: (1) NVIDIA GPU supply constraints during the 2021–2023 shortage created single-source dependency risk; (2) lower cost per FLOP at scale for Tesla’s specific video training workload profile; (3) strategic independence from NVIDIA pricing and allocation decisions. A potential fourth motivation: selling Dojo compute as a service to external AI and video processing companies.
Dojo training applicationsPrimary: FSD video training — processing billions of miles of camera footage from Tesla’s 6M-plus vehicle fleet. Secondary: Optimus humanoid robot neural net training using the same video-based approach. Potential future: external AI and video training workloads as a commercial compute service.
Dojo plus NVIDIA hybridTesla also operates a large NVIDIA H100 cluster alongside Dojo. Estimates as of 2024 cited approximately 30,000-plus H100 GPUs in Tesla’s training infrastructure (est.). Dojo is additive capacity, not a near-term replacement for NVIDIA.
Dojo capital expenditure (est.)Building Dojo is capital-intensive. Tesla has cited over $1B (est.) in Dojo investment through 2024. Ongoing expansion adds to this. This is a significant multi-year capital bet on custom silicon over the NVIDIA commodity path.

The Dojo strategic thesis: If Dojo achieves cost-per-FLOP parity or better with NVIDIA H100s at Tesla’s scale, Tesla gains a structural training cost advantage that compounds over time — both for FSD model quality and for Optimus training. Combined with Tesla’s overwhelming data volume advantage, Dojo-at-scale creates a training flywheel that no competitor can replicate without making an equivalent multi-billion-dollar custom silicon bet.


Section 4 — In-Vehicle Inference: FSD Chip vs. Waymo’s In-Vehicle Compute

Training compute and inference compute are separate races. A better training cluster makes better models. But those models must then run in real time on hardware inside the vehicle — at low latency, low power consumption, and with enough headroom to handle edge cases. The in-vehicle inference chip is the physical AI “last mile” — the component that turns training improvements into real-world driving capability.

DimensionWaymoTesla FSDNotes
In-vehicle compute platformWaymo uses custom compute hardware in its vehicles. Specific chip specifications are not publicly disclosed. The hardware must run perception fusion (LIDAR plus camera plus radar), prediction, and planning simultaneously in real time.Tesla HW3: est. 144 TOPS — in most current FSD-capable vehicles. Tesla HW4: est. 1,000-plus TOPS — in newer vehicles since early 2023. A significant generational leap between hardware generations.Tesla has published FSD chip architecture details publicly. Waymo has not disclosed its in-vehicle hardware specifications.
Custom siliconWaymo has not announced a custom in-vehicle inference chip. It likely uses commercial accelerator hardware for in-vehicle compute.Tesla designs its own FSD inference chip, fabbed by TSMC. In-house chip design team has executed multiple chip generations (HW1 through HW4). This is extremely rare for an AV company.Tesla’s chip vertical integration from training (Dojo) to inference (FSD chip) is unmatched among AV companies.
Inference efficiencyWaymo’s multi-sensor fusion (LIDAR plus camera plus radar) requires significant compute per frame to fuse multiple modalities. The compute load per sensor frame is higher than a camera-only approach.Tesla’s camera-only approach reduces per-sensor compute, but the end-to-end neural network model is large. HW4’s 1,000-plus TOPS provides substantial headroom for larger models and more complex inference.Tesla HW4’s compute headroom may enable capabilities that cannot be served by HW3 — accelerating the software improvement ceiling for newer vehicles.
OTA model deploymentWaymo updates software and ML models over-the-air across its fleet. All vehicles in the fleet receive model improvements simultaneously.Tesla updates FSD software over-the-air. Hardware capability is fixed (HW3 vs. HW4) but software can extract progressively more from existing hardware generations within their compute envelope.Both fleets receive model improvements simultaneously via OTA. Tesla’s 6M-plus vehicle fleet distributes each model improvement to a far larger base.
Fleet-wide improvement multiplierWaymo’s estimated 2,500 vehicle fleet all benefit simultaneously from model updates.Tesla’s 6M-plus FSD-capable vehicles all receive the same OTA model update simultaneously.The value of each model improvement multiplies with fleet size. Tesla’s fleet multiplier is roughly 2,400x larger than Waymo’s.

HW3 versus HW4 divide: A significant fraction of Tesla’s FSD-capable fleet runs on HW3 (est. 144 TOPS). The capability ceiling for FSD on HW3 is lower than on HW4. As Tesla continues producing vehicles with HW4 and the fleet mix shifts, the average in-vehicle inference compute available to FSD increases — expanding the model capability that can be deployed fleet-wide over time.


Section 5 — AI Compute Benchmark Scorecard

DimensionWaymo / AlphabetTesla2028 OutlookEdge
Training compute accessGoogle TPU pod infrastructure (massive scale, internal transfer pricing)NVIDIA H100 cluster (est. 30,000-plus GPUs) plus Dojo D1 (custom, growing)Both operate at large scale. Dojo scaling narrows the gap.Roughly equal — Waymo Google TPU vs. Tesla Dojo-plus-NVIDIA
Training compute costInternal transfer pricing — estimated well below market rateSignificant capex (Dojo) plus opex (NVIDIA cluster rental, est.)Dojo cost-per-FLOP may decline significantly at scaleWaymo (likely lower near-term training cost)
Training data volumeLimited by est. 2,500 vehicle fleet — orders of magnitude smaller than Tesla6M-plus FSD-capable vehicles generating continuous real-world camera dataGap widens as Tesla fleet growsTesla (overwhelming, compounding advantage)
Custom training siliconNone — uses Alphabet/Google TPUs (Google-designed, not Waymo-designed)Dojo D1 (Tesla-designed, NVIDIA-independent, custom video training architecture)Tesla building toward Dojo self-sufficiency over NVIDIATesla (strategic independence)
In-vehicle inference chipCustom hardware — specifications not publicly disclosedHW3 (est. 144 TOPS) plus HW4 (est. 1,000-plus TOPS), custom TSMC-fabbed FSD chipHW5 likely in development. Tesla inference roadmap advancing.Tesla (published specs, substantial HW4 headroom)
OTA improvement deploymentFleet of est. 2,500 vehicles receives each model update6M-plus vehicles receive each OTA model update simultaneouslyGap widens with fleet growthTesla
Vertical integrationPartial — Google TPU for training, undisclosed hardware for inferenceHigh — Dojo for training, custom FSD chip for inference, OTA software stackTesla is the most vertically integrated AV company in siliconTesla

Overall verdict: Waymo’s access to Google’s TPU infrastructure is a meaningful training compute advantage over standalone AV startups and most AV subsidiaries — but it cannot compensate for the training data deficit created by Waymo’s small fleet. Tesla’s data advantage (6M-plus vehicles generating billions of real-world miles) combined with Dojo’s growing training capacity and the FSD chip’s inference compute creates a compounding data-plus-compute flywheel that no competitor has matched.

The single most important insight from this analysis: in machine learning at scale, data quantity at sufficient quality almost always beats compute quantity alone. Tesla has both more data than any competitor and growing compute capacity. Waymo has more compute access per data point than Tesla — but orders of magnitude fewer data points. The training arms race is currently running in Tesla’s favor on the dimension that matters most: the product of (training data volume) times (model iteration speed). Waymo’s Google TPU advantage is real. Tesla’s data flywheel is larger.


Section 6 — About This Series

This is article 182 in the Physical AI Benchmark Series. Previous articles have covered the ramp index, the humanoid race, unit economics, global competition, HD mapping, fleet operations, software and OTA architecture, insurance and liability, partnerships, competitive moats, Cybercab versus Model Y, safety data, Waymo Gen 6, Optimus manufacturing, scorecard snapshots, the 2030 forecast scenarios, the investor framework, Waymo city expansion, Tesla state approvals, AV weather constraints, the talent war, the regulatory calendar, robotaxi fare pricing, the data flywheel comparison, the humanoid deployment tracker, supply chain analysis, consumer adoption demand, Waymo valuation and IPO analysis, the software architecture deep dive, and the FSD timeline history.

This article adds the AI training infrastructure dimension: what training compute each company deploys, how their in-vehicle inference hardware compares, and why the interaction between training data volume and training compute capacity is the hidden rate-limiter of physical AI quality improvement. The compute arms race is invisible to most analysts benchmarking autonomous driving — but it is precisely this layer that determines how fast the quality ceiling rises for each company’s next software update.


Sources

Tags

Tip