2026-06-18 — views

AV Simulation — How Waymo and Tesla Train on Billions of Virtual Miles

Waymo runs 20B simulated miles per year; Tesla trains on video from 6M vehicles via Dojo — simulation is the multiplier that separates AV leaders.

Article 74 in the Physical AI Benchmark Series — AV Simulation and Synthetic Data

You cannot teach an autonomous vehicle to handle a pedestrian running a red light, a tire blowing out at highway speed, or a child darting into traffic by waiting for those events to happen on real roads. Simulation is the training multiplier that lets AV companies encounter rare and dangerous scenarios billions of times in software before a real vehicle ever faces them.

Waymo runs an estimated 20 billion simulated miles per year (est.). Tesla uses its Dojo supercomputer to train on video from 6 million real-world vehicles. The gap between AV leaders and followers is not just miles driven — it is the capacity to simulate, to generate synthetic training data at scale, and to close the loop between real-world edge cases and simulated training environments.

This article maps the simulation architectures, the synthetic data pipelines, and what simulation capability means for the Physical AI ramp benchmark.

Section 1 — Why Simulation Is Non-Negotiable

The fundamental problem of AV training is the long tail: the distribution of real-world driving scenarios is extremely wide, but the scenarios that matter most for safety — rare, dangerous, near-miss events — appear extremely infrequently in organic real-world data. Waiting for those events to occur on real roads is not a viable training strategy.

Training challenge	Real-world approach	Simulation approach
Rare but critical events	Wait for a pedestrian to run a red light — may happen once per million miles	Generate millions of synthetic red-light-running scenarios with randomized timing, speed, and vehicle positions
Fatal scenarios	Cannot intentionally crash a real vehicle into a cyclist	Simulate collisions in full physical fidelity; train avoidance policies without cost
Edge case coverage	Real fleet accumulates data organically — slow, geographically biased	Simulation can generate data for any geography, weather, time of day, traffic density
Policy iteration speed	Deploy new software → gather real miles → evaluate: weeks per cycle	Test new policy in simulation → evaluate in hours; iterate 100x faster
Corner cases (long tail)	The long tail of rare scenarios is impossibly long in real-world data	Simulation can generate targeted long-tail scenarios on demand
Safety	Training on truly dangerous scenarios is impossible on real roads	Simulation is safe by definition; no risk to humans or hardware

The core simulation principle: every 1 mile of real-world driving can be multiplied into thousands of simulated variations — different weather, different road users, different initial conditions. The company that can simulate most effectively can improve its policy network faster than any company relying solely on real-world miles.

Section 2 — Waymo’s Simulation Platform: Carcraft

Waymo operates Carcraft, an internal simulation platform built over more than a decade alongside its real-world AV program. Carcraft is not a supplementary tool — it is Waymo’s primary training environment.

Attribute	Details
Name	Carcraft (Waymo’s internal simulation platform)
Scale	Waymo has stated it runs approximately 20 billion simulated miles per year (est.)
Architecture	High-fidelity physics simulation; realistic sensor modeling (LIDAR point clouds, camera renders, radar returns); agent behavior models for other vehicles, pedestrians, cyclists
Sensor simulation	Waymo simulates the full sensor suite — a simulated LIDAR point cloud must be physically accurate enough for the real perception stack to process it without modification
Scenario generation	Real-world driving data fed back into simulation to create systematic variation of edge cases encountered on real roads
Agent behavior	Other vehicles and pedestrians in Waymo simulations are modeled with calibrated behavior distributions derived from real-world observations
Infrastructure	Runs on Google Cloud TPUs (Google ownership of Waymo enables access to massive compute); one of the largest dedicated simulation compute clusters in any industry (est.)
Real-to-sim loop	When a real Waymo vehicle encounters something unexpected, that scenario is automatically ingested into simulation for training and regression testing

The real-to-sim loop is Waymo’s structural advantage: every real-world edge case becomes simulation training data within hours. A vehicle that encounters an unusual pedestrian behavior in San Francisco can trigger the generation of thousands of synthetic variations of that scenario — different speeds, different lighting, different weather — before the next software update ships.

The scale of Carcraft also enables regression testing at a level impossible for real-world programs. When Waymo ships a software update, it must pass simulation regression against tens of thousands of previously recorded scenarios before a real vehicle runs updated code. This is the simulation safety net.

Section 3 — Tesla’s Approach: Real Video Plus Dojo

Tesla’s training philosophy is fundamentally different from Waymo’s. Where Waymo builds synthetic worlds, Tesla harvests the real one.

Attribute	Tesla	Waymo
Primary training data	Real video from 6M or more fleet vehicles (petabytes of real-world camera footage)	Simulation plus real miles from approximately 1,500 AV vehicles
Simulation role	Secondary — Tesla uses simulation for specific scenarios but real video is primary	Primary — tens of billions of simulated miles per year (est.)
Dojo	Custom supercomputer built specifically for video training at scale; custom D1 chip optimized for bandwidth between tiles	Uses Google Cloud TPUs (Google parent relationship)
Dojo D1 chip	Custom 7nm chip; 362 TFLOPS FP16; 900 GB/s interconnect between chips — designed for distributed video processing	Not applicable
Training objective	Train a neural network that maps 8 camera feeds directly to driving decisions (end-to-end or imitation learning at scale)	Train specific perception, prediction, and planning modules separately; simulation covers each
Advantage	Real-world data distribution — model sees actual edge cases as they happen in reality	Can generate unlimited synthetic data for any scenario; not limited by fleet size
Disadvantage	Cannot train on truly rare or dangerous scenarios without waiting for them to happen	Simulation fidelity gap — simulated sensor data is not perfectly identical to real sensor data

Dojo is Tesla’s answer to Waymo’s TPU cluster access. The D1 chip is purpose-built for the specific computational bottleneck Tesla faces: processing petabytes of continuous video from millions of vehicles and training large neural networks on that data in real time. Where traditional GPU clusters struggle with the memory bandwidth demands of distributed video training, the D1’s 900 GB/s inter-chip interconnect was designed to address this specific bottleneck.

Tesla’s real-data flywheel creates a different kind of compounding advantage. Every FSD mile driven by a Tesla owner generates training data. As the fleet grows, the training data grows proportionally — and crucially, it grows with the exact distribution of real-world scenarios the model will be deployed in. Waymo must engineer its simulation to match that distribution; Tesla is simply collecting it.

Section 4 — The Sim-to-Real Gap: The Unsolved Problem

The fundamental limitation of simulation-heavy approaches is the sim-to-real gap: a model trained on simulated data may not perform as well when deployed on real-world sensor inputs that differ in subtle ways from the simulation.

Challenge	Description	Current state
Sensor fidelity	A simulated LIDAR point cloud must match a real LIDAR point cloud closely enough for the model to generalize from sim to real	Waymo has invested heavily in high-fidelity sensor simulation; still not perfect — models trained only on sim data underperform on real data
Behavior realism	Simulated pedestrians and drivers must behave like real ones	Calibrated behavior models from real data help; extreme rare behaviors are still hard to model
Domain randomization	Strategy: randomize sim parameters widely so model learns to be robust to any sim variation, which transfers better to the real world	Works for some scenarios; insufficient for others
NeRF and Gaussian splatting	New approach: reconstruct real scenes from camera video into 3D neural representations; re-render from new viewpoints to generate training data	Waymo, Nvidia, and others are using neural scene reconstruction to reduce the sim-to-real gap; promising but compute-intensive
UniSim and GAIA	Waymo (UniSim) and others are building neural simulators that generate photorealistic sensor data from real-world inputs	Active research area; reduces reliance on hand-engineered physics simulation

Domain randomization — deliberately introducing variation in simulated parameters — was the first systematic strategy for sim-to-real transfer. By training on simulations with randomized lighting, texture, weather, and sensor noise, models become more robust to the specific imperfections of a particular simulator. But domain randomization alone has not closed the gap to the level required for production AV deployment in all conditions.

The NeRF and Gaussian splatting approach represents a fundamentally different strategy: rather than building a synthetic world from first principles, capture the real world in a 3D neural representation and re-render it from any viewpoint or under any conditions. A scene recorded by a Waymo vehicle in San Francisco can be re-rendered in rain, at night, with occluded pedestrians added — without requiring the physics simulation pipeline to model those conditions from scratch.

Section 5 — Simulation as Competitive Moat

Simulation capability has become a primary competitive dimension in the AV race. The company that can simulate faster, more accurately, and at greater scale can iterate policy faster than any competitor relying more heavily on real-world testing.

Dimension	Leader	Why it matters
Simulated miles per year	Waymo (approximately 20B miles est.)	More simulated miles equals more edge case coverage equals safer real-world performance
Simulation compute	Waymo (Google TPU access) vs Tesla (Dojo)	Scale of compute determines how fast policy can be iterated
Real-to-sim pipeline	Waymo (Carcraft real-to-sim loop)	Faster ingestion of real-world edge cases into simulation means faster improvement
Neural simulation	Active race (Waymo UniSim, Nvidia COSMOS, others)	Next frontier: photorealistic neural simulator eliminates sim-to-real gap
Scenario library	Waymo (largest library built over 10 or more years)	A deep scenario library is hard to replicate — years of engineering to build
Data flywheel integration	Tesla (real fleet → real video → training → better model → larger fleet)	Tesla’s advantage: real data at scale; simulation is supplementary

Nvidia COSMOS (2025): Nvidia launched COSMOS, a world foundation model for physical AI simulation, in early 2025. COSMOS generates photorealistic synthetic video for training robotics and AV systems. It represents the first general-purpose neural world simulator available as a product — potentially democratizing high-fidelity simulation for companies without Waymo’s or Tesla’s in-house simulation infrastructure. For smaller AV programs, COSMOS reduces the barrier to entry for high-quality synthetic data generation from years of engineering investment to a compute budget.

The scenario library advantage is particularly durable. Waymo has spent more than a decade building a library of edge cases, rare events, and corner scenarios — each tagged, categorized, and continuously added to as the real-world fleet encounters new situations. A competitor entering the simulation race today would need to engineer from scratch all the edge cases that Waymo has already catalogued, in addition to building the physics simulation infrastructure. This creates a compounding moat that grows larger with each year of operation.

Section 6 — What Simulation Capability Means for the Ramp Benchmark

In the Physical AI ramp benchmark, simulation capability is a leading indicator rather than a lagging one. The ability to simulate at scale predicts future safety performance improvement rates — a company with strong simulation infrastructure today will be able to iterate policy faster and cover more edge cases in the next 12 to 24 months than a company with weaker simulation.

The benchmark implication: when evaluating AV programs, simulation metrics — simulated miles per year, real-to-sim loop speed, scenario library depth — are as important as current real-world safety statistics. A program with excellent current safety statistics but weak simulation infrastructure faces a ceiling on how fast it can continue improving. A program with strong simulation infrastructure that is still ramping real-world miles may have better long-term improvement velocity.

The competitive picture as of mid-2026 (est.):

Waymo holds the simulation depth advantage — decade-plus Carcraft investment, 20B simulated miles per year, and an active research program (UniSim) to close the sim-to-real gap via neural simulation
Tesla holds the real-data scale advantage — 6M vehicle fleet generating continuous real-world video, Dojo purpose-built for processing it
Nvidia is democratizing access via COSMOS — a neural world simulator that could allow smaller programs to generate photorealistic synthetic data without Waymo-scale infrastructure investment

The race to eliminate the sim-to-real gap via neural world models — Waymo’s UniSim, Nvidia’s COSMOS, and competing efforts at Wayve, Motional, and others — is the frontier that will determine whether the simulation advantage remains concentrated in a small number of incumbents or becomes a commodity that any AV program can access.

Section 7 — About This Series

This is article 74 in the Physical AI Benchmark Series. Previous articles have covered the ramp index, the humanoid race, unit economics, global competition, HD mapping, software and OTA, consumer demand, competitive moats, Cybercab versus Model Y, safety data, Waymo Gen 6, Optimus manufacturing, scorecard snapshots, 2030 forecast scenarios, the investor framework, city expansion pipelines, Tesla FSD state approval maps, AV weather and climate constraints, the talent war, regulatory calendars, robotaxi fare pricing, humanoid deployment trackers, supply chain analysis, consumer adoption demand index, valuation and IPO analysis, the Physical AI 2026 mid-year roundup, AV unit economics cost-per-mile breakdown, the AV data flywheel comparison, AV cybersecurity attack surfaces, the Physical AI supply chain, AV fleet operations, AV insurance and liability evolution, the full lifecycle environmental cost of Physical AI, the accessibility layer for elderly and disabled users, the mapping architecture comparison, and the China AV race.

This article adds the simulation dimension: the synthetic training infrastructure that allows AV leaders to accumulate training experience faster than any real-world fleet can generate it — and the frontier of neural world models that may reshape who holds the simulation advantage over the next five years.

Note: Simulated mile estimates, fleet sizes, chip specifications, and competitive assessments are labeled “(est.)” and reflect publicly available information, company disclosures, and industry analysis where available. This article does not constitute investment advice.